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Abstract 

There are mainly used two basic approaches for probabiUstic modeling of motion: 
stochastic in which the object literally makes succeeding random decisions using arbi- 
trarily chosen by us probabilities or ergodic in which we usually assume some chaotic 
classical evolution and probabilities appear while averaging over infinite trajectories. Both 
approaches assume we know the exact way the system evolves. 

In contrast, in this paper we will focus on thermodynamical motion models: assuming 
maximal uncertainty. Specifically, in the space of possible choices of transition probabil- 
ities, we take the optimizing entropy or free energy one. Equivalent condition appears 
to be calculating transition probabilities as proportions between single steps in canonical 
ensemble of trajectories going through a given point. It makes that these probabilities 
depend on the whole space - the walker cannot directly use them. This model is ther- 
modynamical: only we use it to predict the most probable behavior. Standard diffusion 
models like Brownian motion can be seen as obtained by locally maximizing uncertainty. 
For regular space it agrees with fully maximizing entropy choice of transition probabili- 
ties, but generally while local approximation leads to nearly uniform stationary probability, 
presented approach has strong localization property. Specifically, its stationary probability 
density is the square of coordinates of the minimal energy eigenvector/eigenfunction of 
Hamiltonian for given situation, like Bose-Hubbard or Schrodinger - finally getting agree- 
ment with thermodynamical predictions of quantum mechanics. It also provides natural 
intuition about the squares relating amplitudes and probabilities. 

We will mainly focus on deep understanding of the discrete case, which is mathemat- 
ically simpler: the space is a graph and the question is how to assign probabilities to its 
edges. The basic Maximal Entropy Random Walk choice will be derived and discussed 
in general form - including asymmetric graphs, multi-edge graphs, periodic graphs and 
various transition times. 

Later it will be first expanded to emphasize some paths by using potentials and then 
after making infinitesimal limit we will get the Schrodinger's case. Considering time de- 
pendent potential will lead to similar as in quantum mechanics probability current, or 
thermodynamical analogues of Ehrenfest equation, momentum operator and Heisenberg 
principle. Then we will naturally generalize to multiple particle case by considering en- 
sembles of histories of configurations instead of trajectories. We will first focus on fixed 
number of particles and then by introducing creation/annihilation operators we will get 
to the Bose-Hubbard Hamiltonian for various numbers of particles. 
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1 Introduction 

There are mostly used two probabilistic approaches to modeling the motion. From one 
side there are diffusion/stochastic approaches in which we assume that the object literally 
makes succeeding random decisions, accordingly to local transition probabilities we arbitrar- 
ily choose. From the other side there are classical chaos models, in which we usually assume 
some deterministic evolution and probability density appears on ergodic level: while aver- 
aging position over infinite trajectory. These models assume that we know and control the 
exact way the system evolves, while in real physics there is usually additional large number 
of degrees of freedom, hidden for us, which in practice can be considered only as thermal 
fluctuations. 

Above approaches use strong assumption that we know the exact evolution model. In 
contrast, in thermodynamics we assume maximal uncertainty - for example if there is no 
base to emphasize some scenarios, we should assume uniform probability distribution among 
possibilities. So thermodynamics is not able to predict the exact situation, but only the most 
probable set of probabilistic parameters like density function. Standard application of this 
philosophy is the static picture - canonical ensemble of possible configurations in a single 
moment. 

In this paper thermodynamical approach is applied to model motion - to find the most 
probable probabilistic description of dynamics in situations when there is no base for strong 
assumptions, like for models which use diffusion or chaos approaches. Our considerations 
will be based on thermodynamical principles like maximizing entropy production or generally 
minimizing free energy. This condition appears to be equivalent to assuming canonical en- 
semble of possible scenarios, which this time are not static, but dynamical instead - we will 
assume Boltzmann distribution among dynamical scenarios, like trajectories or histories of 
configuration. 

We base our considerations on local transition probabilities like it is in diffusion mod- 
els. However, there are essential differences between values and interpretations of both ap- 
proaches. This time the local probabilistic rules are not arbitrarily chosen as usually, but they 
are found accordingly to thermodynamical principles - as a proportion between infinitesimal 
steps in canonical ensemble of possible paths going through a given point, like in Fig. [Tj Con- 
sidering ensemble of whole paths requires to know the whole space - in opposite to diffusion 
approach, this time the object cannot have this nonlocal knowledge. Generally direct use by 
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Figure 1 : Different philosophies of probabilistic approaches to mmotion. 
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Figure 2: Two of possible ways for choosing transition probabilities on given graph and exam- 
ple of probability density evolution they produce for 2D lattice with cyclic boundary condition, 
in which all vertices but marked defects have additional self-loops (edge to itself). 



the object of calculated probabilities is not the essence of thermodynamical models - the lat- 
ter assume that the object just chooses a trajectory in too complex or uncontrollable way, so 
we should assume uniform or Boltzmann distribution among possible trajectories which the 
object could choose. The obtained probabilities are only to be used by us to find the most 
probable behavior 

We will see that the standard "static" statistical physics picture and diffusion models can 
be seen as local approximation of maximal uncertainty principle. In many situations, like reg- 
ular space or lattice, both approaches lead to the same predictions, but irregularities make 
that while locally they might look similar, they usually have drastically different global be- 
havior - for example, while diffusion leads to nearly uniform stationary density, densities in 
fully maximizing entropy models usually strongly localize in the largest defect-free region. 
Figure [2] shows example of such surprising difference for two basic models we will consider - 
Generic Random Walk(GRW) as a representant of standard approach locally maximizing un- 
certainty (leading to Brownian motion in infinitesimal limit) and Maximal Entropy Random 
Walk(MERW) as the basis of all thermodynamical motion models we will consider. 

The natural question is: which approach better corresponds to the reality? If theoretical 
reasoning is not convincing enough, let us compare this huge difference in predicted thermal 
equilibrium with expectations of another basic tool used to model reality, namely the quantum 
mechanics. It predicts that a system in rest releases abundant energy and finally deexcitates 
to the ground state thermal equilibrium. We will see that the stationary probability densities 
predicted by the MERW-based models are squares of coordinates of the lowest energy 
eigenvector/ eigenfunction of the Hamiltonian for given situation. For example, in opposite to 
standard approach, stationary probability density agrees with thermodynamical predictions of 
quantum mechanics for the Bose-Hubbard or Schrodinger cases. In analogous experimental 
situation, strong localization property can be seen for example in recent STM measurements 
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Figure 3: Left: Shannon entropy for (p, 1 — p) probabihty distribution (lg(x) := log2(x)). 
Right: schematic distribution of subset size while restricting set of length n sequences of 0/1 
to having p of "0" - Gaussian distribution degenerates to Dirac delta in n ^ oo limit. 

of electron density in semiconductor defected lattice [T] . The general conclusion is that if 
we want to get agreement between statistical physics and thermodynamical predictions of 
quantum mechanics, we should not use ensemble of static scenarios, but dynamical ones. 

The base of such approaches is the maximum uncertainty principle - that when we have 
no additional information, we should assume uniform probability distribution among possible 
scenarios. If we would like to model our system using some parameterized family of statistical 
models, this principle translates to that we should use the maximizing entropy set of parame- 
ters. For example if there is some completely unknown length n sequence of 0/1 symbols, the 
number of possibilities is 2". Restricting to sequences such that p e [0, 1] of symbols are "0", 
asymptotic behavior of their number is: 



is Shannon's average entropy production and has single maximum: 1 (bit of information per 
symbol) for p = 1/2 and we will use lg(x) := logjCx) notation. So if among all possible 0/1 
sequences, we restrict to only those having p very near 1 /2, this looking generic subset in fact 
asymptotically contains practically all sequences. Assuming a different probability or some 
unjustified correlations would reduce the average entropy production, which is parameter 
in the exponent above - statistical model which maximize entropy asymptotically completely 
dominates all the others. Such universal purely combinatorial domination is much stronger 
than only representing our knowledge - if there are no physical reasons to emphasize some 
patterns, complex uncontrolled evolution should with the same probability lead to any of 
possible sequences. For example while counting patterns in some created by nature sequence 
of noninteracting objects, average number of patterns should asymptotically lead to conclusion 
that the sequence is uncorrelated (so called asymptotic equipartition principle). The situation 
becomes more complicated if there is dynamics involved - we will see that what standard 
approach to stochastic modeling unknowingly do, is analogue to assuming here not p = 1/2 
but an approximate value. 

We will start our considerations with discrete situation, obtained for example by discretiza- 
tion of a continuous system, like assigning vertices to subsets of possibilities and choosing ad- 




where 



h(p):=-plg(p)-(l-p)lg(l-p) 
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jacency matrix describing possible transitions (M;^ e {0, 1}). For this graph, we would like to 
choose transition probabilities - for each allowed transition (i, j): M^j — 1, choose a probability 
Sij, normalized for each vertex (^^. 5;^ = 1). Obviously there is large freedom in choice of this 
matrix S. Standard approach maximizes uncertainty locally by assuming that for each vertex, 
each outgoing edge is equally probable - this choice is sometimes called "a drunken sailor", 
here we will call it Generic Random Walk (GRW). In infinitesimal limit it leads to the Brown- 
ian motion. It can be seen that for each vertex, we maximize entropy production for the next 
choice. However, it appears that this local approximation does not maximize average entropy 
production H{S) :— —Yji'^iYjj^ij^SiSij), where Yji'^i^ij — the stationary probability 
distribution which this stochastic process leads to. H{S) can be seen as average entropy per 
step in ensemble of paths produced by this choice of transition probabilities. So maximizing 
H{S) in the space of all possible S for a given graph denotes choosing probabilities such that 
all possible paths on this graph become equally probable. We will see that, like in Fig. [Tj we 
can find this S also by direct calculation of proportions of single steps inside uniform ensem- 
ble of full paths - infinite in both directions. Such choice of S will be called Maximal Entropy 
Random Walk (MERW) and it can be determined for example by condition that for each two 
points, each path of given length between them is equally probable. 

So while we should use GRW only if the walker indeed uses exactly given transition proba- 
bilities, MERW should be used (by us only) if there is no base to assume any local probabilistic 
rules. There are obvious cases that it is not always true, like if the walker indeed throw a 
dice in each intersection in order to use GRW directly. Generally this "no contraindications" 
condition is extremely subtle and there are rather no simple rules to answer if there are no 
hidden local probabilistic rules involved. One suggestion when to use maximal uncertainty is 
to compare its results with predictions of other theories, like the mentioned agreement with 
thermodynamical equilibrium of quantum mechanics suggests to use it for quantum scale ob- 
jects. Another criterion can be using that while GRW emphasizes a concrete discrete distance 
to the neighboring vertices, we will see that MERW can be derived as its scale-invariant limit 
in which this characteristic length goes to infinity. So if the walker is a person, he among 
other thinks in category of single discrete choices, suggesting to shift toward GRW-like local 
models. From the other side, an example is provided by an electron in a crystal lattice - it be- 
haves mainly accordingly to electromagnetic field generated by all atoms, so even if there is a 
discrete lattice there, the system remains deeply continuous, suggesting to use the MERW-like 
approach. Of course there remains a large spectrum of possibilities between these extremal 
choices, for example we could maximize entropy under some local probabilistic constrains to 
model some concrete situation. 

We need to have in mind that assuming such transition probabilities does not mean that 
the walker directly uses them - it could even choose the path in some deterministic way. This 
model is thermodynamical - only represents our knowledge to predict the most probable 
evolution accordingly to information we have. 

Abstract ensembles of four-dimensional scenarios also bring natural intuition about Born 
rule: the squares relating amplitudes and probabilities while focusing on constant-time cut of 
such ensemble. In given moment, there meets past and future half-paths of abstract scenarios 
we consider - we will see that the lowest energy eigenvector of Hamiltonian (amplitude) is 
the probability density on the end of separate one of these past or future ensembles of half- 
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paths. Now the probabihty of being in given point in that moment is probabihty of reaching it 
from the past ensemble, multiphed by the same value for the ensemble of future scenarios we 
consider - is the square of amplitude. 

In physics, uniform distribution among scenarios is usually replaced by Boltzmann distri- 
bution - there will be introduced potential to the graph for this purpose. Thanks of it, while 
taking infinitesimal limit of graphs being regular lattices, the Hamiltonian becomes the stan- 
dard from Schrodinger's equation. So the model for example says that from purely thermo- 
dynamical point of view, while considering corpuscular electron in proton's potential, the best 
assumption is dynamical equilibrium state having probability density of the quantum ground 
state. 

This consequence of assuming only canonical ensemble of possible trajectories rightly bring 
in mind Feynmann's euclidean path integrals ([2J). While they are mathematically very sim- 
ilar, there are also differences. One of them is the philosophy behind - they are imagined as 
obtained by assuming axioms of quantum mechanics and then making philosophically prob- 
lematic Wick rotation of time into the imaginary axis. From the other side, the presented 
approach uses only mathematically universal principles of thermodynamics - does not assume 
axioms of quantum mechanics, but derive their thermodynamical consequences. Another dif- 
ference from path integral approach is that these considerations start with continuous physics, 
while here we rather focus on the discrete case, what allows for additional intuitions and un- 
derstanding of mathematical nuances. There is also essential mathematical difference between 
propagators of these approaches - the one from eucliedean path integral is not properly nor- 
malized to be stochastic propagator. In presented approach there appears required additional 
term (i/'o(y)/'0o(-^)) carrying nonloacality of this effective model: depending on the ground 
state eigenfunction, which depends on the information about the whole system. Besides non- 
locality there appears also other looking problematic effects from quantum mechanics, like 
retrocausality in recently confirmed ([3J) Wheeler's experiment. We need to remember that 
these models are effective - only represent our knowledge and so we cannot imply that such 
effects came directly from the underlying physics. Nonlocality/retrocausality of a model rep- 
resenting our knowledge denotes only that some near experience may bring information we 
were missing about some distant/past situation. 

Different concept which might seem connected is Nelson's stochastic interpretation 
of quantum mechanics ([4]). I would like to distinct considered here models from such 
ambitious approaches to recreating the whole quantum mechanics. The goal of this paper 
is only to improve stochastic modeling by not arbitrarily choosing transition probabilities 
as usual, but finding them accordingly to thermodynamical principles instead. Resulted 
models are in agreement with predictions of quantum mechanics only when their areas of 
focus intersect (thermal equilibrium), but generally there are essential differences between 
them, for example deexcitation is continuous process here. Mathematically closer is so called 
"Euclidean quantum mechanics" of Zambrini ([]5l]) - there can be found similar formulas 
as here for single particle in time independent continuous case. There are also essential 
differences, mainly similar to Nelson's interpretation, motivation is resemblance to quantum 
mechanics and that instead of standard evolution there is used so called Bernstein process: 
situation in both past and future (simultaneously) is used to find the current probability 
density. 
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The disagreement of standard stochastic models (approximating thermodynamical princi- 
ples) is one of many reasons of reluctance for imagining electron as a particle - undividable 
charge carrier, of radius so small that it is practically unmeasurable in particle colliders. Or- 
thodox view on quantum mechanics leads to that physicists often try to forget about this half 
of wave-particle duality. The need for seeing electron as only a wave does not longer apply 
to macroscopic physics - for example in defected lattice of semiconductor or optical lattice, 
there is some concrete spatial density of particles - we should be able to imagine electrons or 
atoms hopping between sites like in Bose-Hubbard model. And so there should be also some 
stochastic description of such hopping, finally naturally appearing using presented dynamic 
thermodynamical approach. 

While there are constructed and conducted experiments requiring simultaneous presence 
of both natures (e.g. Afshar experiment), orthodox view on wave-particle duality is that the 
particle has just one of these natures in given moment. However, there are only vague condi- 
tions which one exactly, like that electron is a wave near nucleus or while traveling through an 
unknown path. It is corpuscle if we know something about this path to prevent interference. 
Even more difficult would be the question of mechanism of changing this nature in continuous 
physics. Much less problematic view was started by de Broglie ([6J) in his doctoral paper: that 
with particle's energy (£ = mc^), there should come some internal periodic process (£ = hco) 
and so periodically created waves around - adding wave nature to this particle, so that it has 
simultaneously both of them. Such internal clock is also expected by Dirac equation as Zit- 
terbewegung (trembling motion). Recently it was observed by Gouanere ([7]) as increased 
absorbtion of 81MeV electrons, while this "clock" synchronizes with regular structure of the 
barrier. Similar interpretation of wave-particle duality (using external clock instead), was re- 
cently used by group of Couder to simulate quantum phenomena with macroscopic classical 
objects: droplets on vibrating liquid surface. The fact that they are coupled with waves they 
create, allowed to observe interference([8j) in statistical pattern of double slit experiment, 
analogue of tunneling([9]): that behavior depends in complicated way on the history stored 
in the field and finally quantization of orbits ( IIIOII ) - that to find a resonance with the field, 
while making an orbit, the clock needs to make an integer number of periods. 

Like for tunneling in Couder's paper, such waves works also as practically unpredictable 
for us fundamental noise - thermodynamical models are used to handle such situations. The 
proper constant to get Schrodinger equation from MERW was obtained by the choice of pro- 
portion between time and space lattice steps while the infinitesimal limit - not requiring some 
specific thermodynamical beta. It could be misleading, but similarity to quantum formalism 
for time dependent considerations, suggests to choose (3 = 1/h. In thermodynamics (3 is 
related to temperature (T): ^ — In standard view this temperature describes average 

KgT 

energy of microscopic degrees of freedom as ^kgT. In our case, it is not standard energy, 
but energy of path (action): multiplied by time. For example if we choose this time as 
period (1/v) of some periodic process like internal clock, we get average energy as ^Hv - the 
level of uncertainty provided by the wave nature of particles. Surprising observation is that 
while these thermodynamical models completely ignore the wave nature which seems to be 
required for orbit quantization condition, they already "see" the structure of eigenstates. 

The basic MERW formulas were known at least since 1984 to generate uniform path dis- 
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tribution required in Monte Carlo simulations (HIT]]). However, using them for just stochastic 
modeling seems to appear in recent years (0121], [(1311, [I14IJ). A simplified derivation for ba- 
sic expansions: adding potential and making infinitesimal limit to get Schrodinger equation, 
can be found in [fTS] . Some discussion about its connection with quantum mechanics can be 
found in [|15 | ] . In present paper the considerations are made in more formal way and there 
are discussed some generalizations - for multi-edge graphs, directed graphs, periodic graphs, 
various transition times, time dependent case and multiple particles. 

The second section contains preliminary definitions for graphs, stochastic models on them 
and the Frobenius-Perron theorem with discussion of periodic graphs. It also introduces to 
convenient interpretation of multi-edge and weighted graphs, in which the number of paths 
can be defined in two ways, called paths or pathways for distinction. 

Section 3 concentrates on the basic MERW and its comparison with GRW. It contains two 
different derivations of MERW - as scale-invariant limit of GRW and by assuming uniform 
probability distribution among possible paths. There will be discussed combinatorial entropy, 
especially from the point of view of random walks. A convenient way to see the essential 
difference of behavior of these two approaches to random walk, is through their localization 
properties - there are presented and discussed numerical simulations for defected lattices. 
These examples also introduces potential in combinatorial way, mainly to prepare for more 
physical way in succeeding sections. To make this section purely combinatorial, it is the only 
one which uses multi-edge interpretation of weighted-graphs. 

In comparison, section 4 introduces more physical interpretation of weighted graph, which 
will be also used in later sections: as assuming Boltzmann distribution among possible paths. 
It considers lattice graphs with physical potential to make infinitesimal limit, deriving deexci- 
tation to the ground state probability density of the Schrodinger equation. 

Section 5 generalize these considerations to time-dependent case. It starts with discrete 
ones: using time-dependent eigenvector analogues and then there is discussed infinitesimal 
limit. While rapid potential changes, there appears difference between past and future am- 
plitudes. Like in stationary case, these amplitudes should be nearly equal while relatively 
slow evolution, maintaining thermal equilibrium - we will call such assumption as adiabatic 
approximation. Time evolution allows to define thermodynamical analogue of the momentum 
operator (ftV), which is not self-adjoined this time. While considering Ehrenfest equations, 
there appeared very surprising result - that we get second Newton's law, but with opposite 
acceleration. Fortunately, it appears to be natural in thermodynamical case: if probability 
density needs to get to a different potential minimum, it first has to accelerate uphill the 
potential, than decelerate downhill to finally stop in this new global minimum equilibrium 
state. In adiabatic approximation we can also introduce analogue of Heisenberg uncertainty 
principle. 

While previously there were considered single particle in the space, in section 6 there are 
discussed generalizations to multiple particles. Assuming approximation that these particles 
does not interact with each other, obtained probability density is also expected actual den- 
sity of such large number of particles. Interaction appears in analogue way as in quantum 
mechanics. The fact that amplitudes are real and positive now, make that we cannot make 
antisymmetrization to directly include Pauli exclusion principle. However, Coulomb repelling 
itself is enough to forbid particles to choose the same state of dynamical equilibrium. There 
is also presented combinatorial view on annihilation/creation operators to finally recreate the 
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Bose-Hubbard model. Taking infinitesimal limit should lead to quantum field theory analogues 
as further perspective. 

The last section briefly concludes the results and suggests ways for further development. 
While quantum mechanics focuses on wave nature of particles practically ignoring corpuscular 
one, presented approach do exactly oppositely - there will be also briefly discussed a way 
to combine both pictures using soliton particle models with topological charges as quantum 
numbers. 

2 Preliminaries 

2.1 Basic definitions and properties of graphs 

We will start our considerations with the general discrete case: the walker makes succeeding 
transitions on some discrete set of locations. Generally this set could be infinite like for 
a lattice, but for simplicity let us assume that it is finite, like a part of lattice with cyclic 
boundary conditions. Time required for different transitions generally could be various, but 
for simplicity let us assume for this moment that it is constant, so we can describe time as the 
set of integer numbers (t e Z). 

Let us assume that we have a graph (y, S) with some finite number of vertices y : #y = 
N e N identified by their number and some set of edges S e {1, 2, ..,N}^. Generally we will 
allow to put real positive weights on these edges - natural numbers can represent multiple 
edges between given vertices. Later there will be introduced potential of vertices by using 
edge weights like e~^'. 

In any case, we will identify the graph with real positive N xN matrix M. Adjacency matrix 
of graph M is defined as: 



• multi-edge graphs for which there are also allowed multiple edges between two vertices: 



Mathematical formalism will be general, so this distinction has practically only interpretational 
meaning. Weights being natural numbers can be seen as the number of edges, but we will see 
that general weights can also be imagined this way. 

Transition from i to j vertex in multi-edge graphs can be made through one of M^j edges - 
edge (i, j) corresponds to M^j ways of transiting through it. To handle with such situations, we 
will distinguish paths made on adjacency matrix fi-om pathways corresponding to the number 
given path can be realized: 




if = ((i, j) ^ there is no edge from i to ;) 
if Mij > ((i, j) e S; there is an edge fi"om i to ;). 



(1) 



We will generally distinguish three types of graphs: 

• simple graphs for which there can be only single edge between vertices: 
A^j^M.jG [0,1], 



M^j G N, 

• weighted graphs for which < M^_,- e M. 
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Definition 1. 

()'i)|=o length I path or pathway on graph M, if V;M^.^.^^ > 0^ 
(ri)|=o path contaim M^^^M^^^^..M^^_^^^ pathways. 

Notation: The index range in obvious cases will be omitted. 

Observation 2. 

is the number of length I paths from i to 
{M^\j is the number of length I pathways from i to 

For example {,M% = Y.r,,r„..,n ^ir2-^nr 

For simple graphs there is no difference between path and pathway. In opposite to multi- 
edge graphs, for weighted graph above interpretation seems strained, but still it will lead to 
self-consistent mathematics. 

Above definitions for length I path ((yJl^o^ time from to I can be naturally extended 
to different time segments, like [t,t + 1 — 1] and also to infinite paths: one-sided infinite to 
the past ([—00, t]) or to the future ([t, oo]) and UnaWy full paths ([— oo, oo]). 

Let us define the basic concepts for graphs: 
Definition 3. 

Graph is called indirected, if Vj^- M^j — M^^, 
Neighbors of vertex i are := {j : Mij > 0}, 
Degree of vertex i is d; := ^tp 
j is accessible from i, if 3;(M')jj > 0, 

Distance from i to accessible ; is the minimal Z e N : (M');^ > 0, 

(ri)|=o path is length I loop, if Xo = Yi, 
Self-loop is length 1 loop^ 

Graph is called strongly connected, if for all vertex i is accessible from j. 

Period of strongly connected graph is the greatest common divisor of {I : (M');; > 0}^ 

i and ; are in the same periodic component, if their distance is divided by period p. 

Vector V is called nonegative (v > 0), if VjO < Vj e 

Vector V is called positive (v > 0), if V^O < V; e 

Matrix M is called nonegative (M > 0), if V^^- < M^j G R, 

Matrix M is called irreducible, if 3^^/^ (M")ij > 0, 

Graph is called irreducible, if is strongly connected and has period 1 or equivalently if its 
adjacency matrix is irreducible. 

Restrictions for self-loops are not required - there can be allowed transitions from vertex 
directly to itself, adjacency matrix may have nonzero values on the diagonal. 

We will consider general directed graphs: in which edge can work in both or single direc- 
tion, but it is worth to distinguish indirected graphs, for which if there is transition from i to j, 
there is also transition from j to i - the adjacency matrix is symmetric. For simplicity we will 
use stronger condition: that M is symmetric. This symmetry simplifies the situation: among 
others it means that the space of paths is time symmetric, M matrix is diagonalizable, Markov 
process will fulfill detailed balance condition. 
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regular grapli grapli graph 




Figure 4: Some examples of graphs divided into periodic components. We will later see that 
constant vertex degrees (d) inside components will make that GRW and MERW is the same 
on these graphs (generally not true). 

Another important graph property is connectiveness - that for each two vertices, there exists 
a path between them. Situation is simple for undirected graph - path from i to ; means that 
backward path is from ; to i. If such graph is not connected, random walk would remain 
in maximal connected subset (connected component) - we could divide the graph into such 
independent connected components and consider them separately. 

Situation is more complex for directed graph - path from i to j does not imply existence 
of path from ; to i. In this case there can be vertices from which the walker should finally 
get to a subset, from which he cannot return to the initial state. We will be mainly interested 
in probabilistic equilibriums, so we can forget about these transient vertices he cannot return 
to - their probability will quickly drop to zero. So without loss of generality, we can focus on 
strongly connected graphs, for example chosen as maximal strongly connected subgraph of 
the original graph, which is called its strongly connected component. 

More complex property which also can be removed without the loss of generality is graph 
periodicity: the greatest common divisor of {n : (M");; > 0} is called the period of vertex i. In 
strongly connected graph all vertices have the same period p e N, so we just talk about the 
period of graph - the length of each loop in this graph is a natural multiplicity of p. Standard 
example is bipartite graph - the set of vertices can be divided into two disjoined subsets, such 
that edges are only between these subsets (no internal edges), so all its loops have even length 
(p = 2). For indirected graphs each edge can be seen as length 2 loop, so they cannot have 
larger period than 2, in which case it is bipartite graph. 

Generally, if p > 1 we can divide the graph into disjoined subsets (periodic components) of 
the same distance modulo p from any fixed vertex v: 

Q := {u : 3„(M")^„ > A n = i modp} (2) 

So while making single step from Q, the walker gets to Ci+n^^od p)- By focusing on a 
single periodic component and using matrix instead, these components can be treated 
independently: we get p separate multi-edge/weighted aperiodic graphs. We will use this 
reduction to be able to focus only on irreducible graphs. Using the original M matrix later, we 
can connect back the behavior of these components. 



2 PRELIMINARIES 



12 



We are now ready to remind the basic theorem for our considerations - about the dominant 
eigenvector of M. It was first proven by Perron ([16J) for positive matrices and later gener- 
ahzed by Frobenius C lll7in for nonnegative ones. In this case, uniqueness requires that graph 
is strongly connected and aperiodic - fulfilling these both conditions is called irreducibility or 
primitiveness in hterature. We will use the first name here: 

Theorem 4. Perron-Frobenius theorem (PF): for nonnegative irreducible square matrix M, the 
dominant eigenvalue (having largest absolute value) is nondegenerated and the corresponding 
eigenvector can be chosen as positive. 

If a matrix fulfills these conditions, they are fulfilled also by its transposition, which has 
the same set of eigenvalues. Finally for the largest A > 0, there exist exactly one positive 
normalized right and left eigenvectors: 

Mip^Xxp ip^M^Xip^ (3) 

If the matrix is symmetric, xp — ip. For asymmetric matrices it is more convenient to use 
(f^xp — 1 normalization. 

The fact that the other eigenvalues have smaller absolute value, allows to use approxima- 
tion: 

M^^^X^xpif^ iovl^oo i?i^xpif^ -xp^X^ip , if^ -X^xpif^ ^X^if^) (4) 

Situation for periodic graphs is more complicated. Like previously, instead of the original 
matrix, let us use first. The graph becomes aperiodic, but looses connectivity. So we can 
use PF theorem for its single connected components, getting unique eigenvalue for some 
A > and corresponding eigenvectors (i/)-') on each of these subsets: 

V . MPxpi = XPxp^ (if^YM = Pi^iif^y (i ^ Cj xpl = If I = 0) 

Any linear combination of these right/left eigenvectors would be corresponding eigenvector 
of MP. Returning to the original M determines the connection between these components: 

xP^ = fr, i^'Y = i^'Y ■ f ■ Now 

are corresponding eigenvectors of M, for example 

j=0 ]=1 j=0 

Combinations ([5]) for A being different complex p-th root of would be also eigenvector 
of M to this eigenvalue - in periodic case there are p dominant eigenvalues (with the same 
absolute value), but there is only one real positive. By writing dominant positive eigenpair, 
we will refer to this one. 
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2.2 Markov process on a graph 

Let say we would like to model some system using a graph - divide the space of possibilities into 
disjoined subsets, assign a vertex to each of them and choose edges accordingly to possible 
transitions. For example we have a semiconductor lattice of atoms and we would like to 
imagine electrons jumping between such potential wells - one way to represent it as a graph 
could be assigning a vertex to each atom and connect it with its neighbors. We could also 
choose different discrimination, like into larger regions the electron could be in given moment. 

We rather cannot precisely say to which region given electron will jump now - the com- 
plexity makes that the only reasonable approach seems to be some stochastic. The question is 
how to choose probabilities of transitions between vertices of such discretised system. Direct 
measurement of these probabilities is usually difficult, so let us assume that our knowledge is 
only the precise structure of such graph - we would like to find the most appropriate stochastic 
process for it. 

In such situations of limited knowledge, there is used maximum uncertainty principle - 
among all probability distributions we could assume, the most appropriate is the one maxi- 
mizing entropy. In simple words: which assume as little as possible. If we know only the graph, 
we rather do not have a base to assume some dependence of the history - entropy is maxi- 
mized for Markov presses: in which probability of transition depends only on the vertex/state 
the walker is currently in. We also usually have also no base to assume that such probabilities 
vary with time, so we should focus on time homogeneous processes: these probabilities are 
chosen as time independent. 

In this paper we will mainly focus on time homogeneous Markov processes. Analysis of 
entropy of more complicated stochastic processes on graphs can be found for example in [19]. 

Definition 5. 

S matrix is called stochastic on graph M, if V^^ < S^j < 1 and V; Yaj^ij — 1 ^ij = => 

= a 

Nonnegative vector p — (Pi)JLi is probability density on this graph, if YliPi — 1^ 
Probability density n is stationary for stochastic matrix M, if n^S^j = tZj. 

Sij is the probability that while being in vertex i, the walker will choose to jump to vertex j. 
The second condition above normalizes the probabilities and the third one restricts transitions 
to edges of the graph. The knowledge of the walker's position is usually incomplete, so we 
need to work on probability density representing our knowledge. It usually reduces while time 
passes and it should approach some limit - stationary probability density in given connected 
component, which is eigenvector of S to eigenvalue 1. We would like to use PF theorem to 
get this uniqueness. For this purpose we will require that vertex accessibility of the stochastic 
matrix is the same as for the original one ((M'^);j > ^ > 0)^ 

Definition 6. Stochastic matrix S on M is nondegenerated, if V;^ M^j > => 5;^ > 0. 

To handle with situation that the graph is periodic, like previously let us consider its par- 
tition into periodic components - disjoined subsets of fixed distance modulo period (p) from 
some chosen vertex (|2]) - probability density visits these subsets cyclically. As previously, using 
stochastic matrix instead, the walker remains in a single component - stochastic matrix 
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restricted to such subset is aperiodic, so we can use PF theorem to get some unique sta- 
tionary probabihty density. Let 7i° be stationary probabihty density on the first component 
(n^S^ — 71°). Now n°S^~^^ — 7i°S is the unique stationary probabihty density on the second 
subset and so on (A = 1). FinaUy 




is the unique stationary probabihty density on the whole graph: 

Observation 7. Nondegenerated stochastic matrix on strongly connected graph has unique sta- 
tionary probability density. 

3 Derivations and properties of MERW 

Let us assume that there is a strongly connected graph and without any additional knowledge, 
we would like to choose a stochastic matrix on it. The standard approach is that the walker 
chooses where to jump with uniform probability distribution among possible single transitions. 
We will call this choice Generic Random Walk: 

Definition 8. Generic Random Walk (GRW) on graph M is called stochastic process given by 

If the graph is default(M), we will use an abbreviation for GRW and S^ for MERW, in 
other case we will use full notation like above. 

Observation 9. For symmetric M (indirected graph), stationary probability density of GRW is 

(^^GRW(M) nf = ^ (7) 

Proof: dAj = i:, d.f = M,j = J]^ M,, = d,, ^ = 1- 

3.1 MERW as scale invariant limit of GRW 

The walker in GRW makes random decisions accordingly to the knowledge about the nearest 
neighbors - GRW emphasizes distance corresponding to a single transition. The graph we are 
using could be created as discretization of a continuous system, which usually does not have 
such characteristic lengths - we would rather expect scale-invariant model. Here we will find 
such limit of GRW and call it MERW - later on we will see that it also maximizes entropy 
among all possible random walks on a given graph. 

We will start from a generalization of GRW in which instead of assuming uniform proba- 
bility distribution among single edges (length 1 paths), we will choose uniform distribution 
among length I paths and call it GRW;, like in Fig. [5} 
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GRW 
S^ = prob(t ) ^ 0.5 




GRW4 
0.375 



■» MERW 




0.382 



¥1 



¥3 



® 



Figure 5: Example of generalizations of GRW - the number of length I paths starting from 
given edge is written on its left. Above the graph there are written approximate probabilities 
of going up from vertex 2, obtained by normalization of the numbers of paths. The length I 
paths from vertex 2 are symbolized on the right side of graphs. 



Definition 10. S 



GRW,(M) 
0' 



for I e N+ 



We would like to calculate these probabilities for I —>■ 00 limit. For this purpose we need 
asymptotic behavior of '^^.iM^~^)j^. — (M'"^ • (1, 1, .., 1)^)^- for all vertices j. For irreducible 
matrix we can directly use Q : 

Observation 11. For strongly connected aperiodic graph, the normalized number of one-sided 
infinite pathways from j to the future (or past) is proportional to xpj('fj): 



lim — = — 



lim 



(8) 



where Mip — Xxp, if^M — Ac/?^ is the dominant positive eigenpair. 

If the graph has period p > 1, equation is fulfilled if j and f are in the same periodic 
component (p divides their distance). 

For periodic graph, as previously, we take adjacency matrix first. As long as j and j' 
are in the same periodic component, we can use equation (js]) for aperiodic matrix. This 
way we have shown the above limit ([8]) for I being natural multiplicities of p. For a general 
I, let us observe that we can write = (^"^ " (^''(1. 1. 1)^));, which leads to 

some dominant eigenvector of M. There are p of them (formula ([s])), but division of their co- 
ordinates inside a single periodic component does not depend on this choice of the eigenvector. 

Returning to the scale-free limit of GRW, all neighbors of given vertex are in the same 
periodic component, so we can use above observation: in the I — > 00 limit, probability of 
jumping from vertex i to vertex j is proportional to M^jXpj. The normalization is — 
Ai/^i, so finally we obtain the stochastic matrix: 
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adjacency matrix iGeneric Random Walk Graph 



M = 



10 
10 10 
10 1 
10 1 



useful constant: 

/ :=(V5-l)/2« 0.618 
/'=!-/ 



GRW 

s = 



GRW 
71 = 



/O 1 
1/2 1/2 
1/2 1/2 
10 




(1,2:2,1476 



Maximal Entropy 
0.167(1)0.138 

333© 0.36 J \X i . 

0.5>f 0.618 \0 1 

osioeis N/=(i,i+/,i+/,i//V6T27 

0.333© 0.362 71^'" ^C'^,)' 



P(2^ 1 ^2) = 1/ 



0.5^0.382 




P(2 ^ 1 ^ 2) = 1 -_/ 
P(2^3^2)= 



Figure 6: MERW and GRW for simple graph. Probabilities of paths 2 — » 1 
are equal in MERW and in GRW the first one is twice more probable. 



2 and 2 



Observation 12. For strongly connected graph, in the limit Z — » 00 0/ GRWi we get 



i—ip^ for S3anmetric M) 



r MERW(M) _^ M / 

where Mip = Xip, ^p^M = Xip are the dominant positive eigenpairs, — 1- 

Let us check that above is the unique stationary probability distribution: 



(9) 
(10) 



(11) 



This time we have guessed this density, but it will be derived while considering ensembles of 
full paths. 

We can now calculate stochastic propagator: if the walker is in vertex i, probability that 
after I steps it will be in vertex j is 



Mir 2 



n-Ti-i 



n 



n-i 



A' 



It can be imagined that there are (M');^ pathways and each of them has jj^ probability. 
While in GRW the walker can choose transition probabilities using only local knowledge, 

lb ■ 

the term in MERW probability transition formula depends on the situation of the whole 
system - this effective model is nonlocal. It does not mean that the walker directly uses these 
nonlocal rules, but they are used only by us: to make the best predictions, we need to know 
the whole space of possibilities. 



(12) 
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3.1.1 Equally probable pathways 

Calculating MERW probability of iYi)\=o pathway, we get interesting observation that it does 
not depend on internal vertices: 

For simple graph it means that for fixed length and ending points, all paths of this length be- 
tween them are equally probable (y^y^). For multi-edge (and weighted) graphs, we have to 
remember that they consist of many pathways and so probabilities of paths should be propor- 
tional to these numbers of pathways: 

Definition 13. Pathways (yg, ■■,Ti) <^^d {y^, ..,'yp are equally probable if 

= (= 1 for simple graphj (14) 

Observation 14. Maximal Entropy Random Walk is the only random walk in which for any 
length and two vertices, each given length pathway between them are equally probable. 

We already know that MERW fulfills the above condition. To see that the condition ( [14) ) 
determines stochastic process in an unique way, for each vertex (i) and its two outgoing edges 
(to we should find a vertex (fc) and length (0, such that there exists two length I paths 
between i and k: starting with the first and with the second edge. In such case, counting 



corresponding pathways and using the condition (14), we get unique S^j/Siy proportion. 

Let p > 1 be the period of M. Now is irreducible inside each periodic component, so 
some its power (M"^) is positive inside all components. Now because j and j' are in the same 
component, taking k as any point in this component and / — np + 1, we get the existence of 
required paths. 

Generic Random Walk is usually different than MERW and so the condition ( [141 ) 
longer valid - GRW prefers paths through vertices of lower degrees, like in Fig. [6[ 

M ..M 

ron nr2 r,-.r, d^d^^...d^^_^ 

3.1.2 Renormalization 

Another view on scale invariance is some freedom in choosing spatial discretisation of con- 
tinuous system, like in Fig. [t) Transforming the graph M (for example representing single 
transitions) into multi-edge graph M' which edges correspond to some fixed number of tran- 
sitions, should not change the stochastic model: 

r r^MEm^ ^ y ^ . ^ ^ . ^ ^ r^MERwcM')^ 



3 DERIVATIONS AND PROPERTIES OF MERW 



18 




M 



Figure 7: Renormalization of some defected lattice graph - the original simple graph is trans- 
formed into corresponding multi-edge graphs on sublattices of \/2 times larger constant. 
Above self-loops there is written the number of them. In opposite to GRW, MERW is con- 
sistent with such change of discretization scale: s^^™^^'^ = (^sMERw(M)y_ 



For GRW analogous relation usually is not satisfied: ^5GRw(M)y j^^g stationary probability 



density 7i™™, while S^. 



GRW(M') 



(M'V 



thanks to (4) for aperiodic strongly connected graph 



goes to ^" ,V,'' — oc (it can be also seen from observation 
different stationary probability. 



11 ), which is usually completely 



3.1.3 When GRW=MERW? 

GRW and MERW are usually different, so let us now characterize cases they are the same: 



V,— = — - ^ V,,,.^.>o A- = ^,. (17) 



For vertex i, this condition has to be fulfilled for all its neighbors, so has to be constant 
inside neighborhood of any vertex. If neighborhoods of two vertices are not disjoined, i/) has to 
be constant in their union and so on - we can expand this set with not disjoined neighborhoods 
of succeeding vertices. This way we get division of all vertices into disjoined components, such 
that the neighborhood of each vertex is a subset of one of them. Transitions from all vertices 
of such single component lead to the same different component, so above construction is 
exactly dividing the graph into periodic components (or we get single component for strongly 
connected aperiodic graph). 



Knowing that has to be constant inside periodic components, (17) means that vertex 
degrees also have to be constant inside components. Multiplying the eigenvector by , the 
coordinates are multiplied by succeeding degrees, so the eigenvalue is 



^ = y n '^^-A component (= ^ for regular graph) 



i=l 
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For symmetric M, constant means that M^; is also constant inside periodic com- 

ponents. For directed graphs the situation can be more irregular, like in Fig. |4] Finally 

Observation 15. GRW and MERW are the same for strongly connected graph, if this: 

- indirected graph is regular (has constant degrees) or bipartite with constant degrees inside both 
periodic components, 

- directed graph has constant dj = M;^ inside each periodic component. 
3.1.4 Detailed balance condition 

The probability that the walker uses (ij) edge is the probability of being in i vertex multiplied 
by probability of using (ij) edge then: it is n^Sij normalized to 1: 



We can now look at symmetry condition for stochastic matrix: 

Definition 16. Stochastic matrix S with stationary probability density n fulfills detailed balance 
condition iff 

It is natural for indirected graphs: 
Observation \7.IfM is symmetric, S^™^^^ and S^^™^^^ fulfills detailed balance condition. 
Proof: For symmetric M, 

n. b- - — ^=1 = ^=1 = Tl-b ■■, 

^MgM ^ ^2^^!^ ^ ^^^^^ ^ ^M5M_ 

So if M is symmetric, the walker uses edges equally frequent in both directions. It usually is 
not true for nonsymmetric M, for example the walker could prefer one circulation direction 
in ring-like graph. 

For nonsymmetric M, there appears some imbalance of probability flow in stationary situa- 
tion - in analogy to electric current, we can define antisymmetric probability current describing 
resultant flow: 

/jj := TiiS^j — T^jSji — ~Iji 

It vanishes for symmetric M and generally fulfills analogue of the first Kirchoff law (continuity 
equation) : 

= 2 '^i^ij ~ 2 '^i^n = 71; - 71; = 
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3.2 Entropy of random walks 

Entropy can be seen as the amount of information required to describe given system. Quan- 
titatively it can be represented in many units, usually multiplied by Boltzmann constant in 
physics. We will use it later, but for better intuition in this section we will count entropy in 
bits of information. The choice of one of 2" elements generally requires n bits of information, 
so in this section we use entropy as base 2 logarithm (Ig = log2) of the number of possible 
choices (Boltzmann formula up to multiplicative constant). 

Assume there is some long sequence of 2 symbols and we know the probability of the first 
one: p e [0, 1], p :—l — p. The number of such sequences behave asymptotically: 



n 



(27i)-^/2 



pnj (pn)!(pn)! (pn)P"+i/2(pn)P"+i/2e" 

= i2nnpp)-^^^p-P''p-P" = {2nnpp)-^^^2-''^P^^P+P^sP'^ 

Kp) := -plgp -p\gp = Mm (18) 

n—>oo fl 

where we've used the Stirling's formula: lim„^„ , -r„^n — 1. 

If we do not know anything about a length n sequence of two sjmibols, the number of 
such sequences is 2". We see that also while assuming p — 1/2, we get the same asymp- 
totic - these sequences completely dominate the space of all sequences like in Fig. [S] It is an 
example of maximum uncertainty principle - that if we do not know anything about proba- 
bility distribution among some events, the best is to assume uniform probability distribution. 
Generally average entropy is the coefficient in exponent, so again assuming probability distri- 
bution maximizing entropy (uncertainty), means focusing on sequences which asymptotically 
dominate the rest of them - almost all sequences fulfills maximizing entropy probability dis- 
tribution. It is generally called Asymptotic Equipartition Property in information theory - for 
more information see e.g. [19]. 

Analogously for more symbols/ events with (p;); probability distribution, average entropy 
per S5anbol is: 

KiPi)i)^-Y,PMPi) (19) 

i 

where we assume lg(0) = 0. 

Let us take it now to a stochastic process (S) on a simple graph (M^j e {0, 1})) : if the walker 
is in the vertex i, his next step will contain — lg(S;j) bits of information. The walker is 
in the vertex i in asymptotically tI; of cases, so finally average entropy production is H{S) — 
— 71; Sij lg(S;j) per step. For multi-edge graph situation is a bit more complicated: now 
there are M^j e N identical edges from i to j of probability Sij/M^j. So S;jlg(Sjj) term in 
entropy formula changes into 
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Definition 18. Average entropy production for stochastic process S with stationary probabihty 
n is 

H(S) = -27i;2S;^.lg(S;.) (20) 

S- ■ 

where for simple graph S' :— S and generally := (=0 for M^j — 0) : 

H(S) = - 2 2 S,,. Ig f ^1 = - 2 71, 2 S,, lg(S,,.) + 2^,2 S,,. lg(M,,.). (21) 

i j V y ! j ! j 

The last formula can be mathematically used also for weighted graph with M having not 
natural values. In this case, we will see the additional term (with IgM^j) as the average energy 
and so the whole formula as minus average free energy per step. 

To show that among all stochastic processes on given graph, the Maximal Entropy Random 
Walk is indeed the only one maximizing this formula, let us calculate entropy for probability 
distribution of length / pathways expected in this stochastic process: 

- Sc^.to ""ro^orA.r.-Sn-^r, {^SiS'^^n'^ + lg(S;^^..S;_^^^)) = 

= H(S) - ""nSnr^-Sr^-.n ^SiS;^r.-S;^_J) = ... = 

where := S^lM^j to include e.g. multi-edge graphs. 

We see that the average entropy production of stochastic process is exactly the entropy 
growth per symbol of the probability distribution of pathways it generates. Without additional 
constrains, the only probability distribution maximizing entropy is the uniform distribution, 
so average entropy production is maximized only for stochastic process generating uniform 
probability distribution among pathways. For finite paths we already know from observation 



14, that MERW is the only random walk having uniform probability distribution among path- 
ways of fixed length between fixed vertices. In the next section we will see that there is also 
analogous condition for infinite pathways. 

Let us now find the maximal average entropy production available for a given graph and 
check that MERW really achieves it. Assume there is some set of pathways ending in given 
point, such that of them ends in vertex i. Expanding this ensemble a single step in all possi- 
ble ways, we get v^M vector of number of pathways. So the maximal increase of the number 
of pathways per step is multiplying by the dominant eigenvalue (A) - their number asymptoti- 
cally grows like A' . Uniform distribution among them maximizes the entropy, leading to upper 
boundary: 

Observation 19. For stochastic process S on graph M, 

H(S)<lg(A) (22) 



where A is the positive dominant eigenvalue of M. 
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Let us check that MERW indeed achieves this boundary: 



cM 



= ig A + ^ - ^Miih^i)) = ig A 

The fact that random walk cannot have larger entropy leads to interesting inequalities. For 
example for GRW while M is symmetric: 

d; v^M;,. fl\ y.d;lg(d;) 

for any nonnegative matrix M. Assuming uniform distribution among the nearest neighbors 
in GRW can be seen as local maximization of entropy - for each i maximize — Xj; Sij Ig(Sij), 
while in MERW we maximize the average entropy production. 

In [|14I1 there are other useful inequalities between some effective degrees of graph: 

V.d; /^V.d;ln(d;)A 

mind^ < < exp < A < maxd^ (23) 

' ^ V zlk^k J ' 



In (23) the first and the fourth inequalities are trivial, the third is equation (22). The second 
inequality can be derived using convexity of F(^) := In (^^^ df ^ 

Xdiln(d;) , rZ-dA 
' ; = FXD > F(l) - F(0) = In ^ . 



There was not required any additional assumptions for inequality (23), so it is fulfilled not 
only for indirected simple graph like in II14IL but also for general multiple-edge or weighted 
graphs. 



3.3 MERW from the point of view of full paths and Born rules 

Up to now we were considering one-sided infinite paths, now we will look at ensembles of full 
paths - infinite in both directions: past and future. It leads to better understood derivation 
of MERW formulas. We will see why considering statistical ensemble of full scenarios leads 
to the Born rules for constant time cuts - that to translate amplitudes into probabilities we 
do need to "square" them. Intuitively, amplitudes correspond to probabilities on the end of 
past/future half-spaces and we need to multiply them to estimate probabilities of events in a 
given moment. 
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Observation 14 says that MERW is the only random walk in which while fixing vertices on 
the ends of a time range, all pathways between them are equally probable. Let us extend it to 
full paths: ()'f)^_^ and If the graph is periodic, Yt x't could belong to different 

periodic components for all t - in such a case it is enough to shift indexes to synchronize them. 
Let us make general observation implying that all full pathways on a strongly connected graph 
are intuitively equiprobable for MERW: 

Observation 20. Assume there is a strongly connected graph and an equivalence relation, in 
which subpaths are equivalent if they have the same length and ending points. For two full paths 
(Tt)?l-oo (yt)t^-oo^ •^"^^ ^^^^ To y'q are in the same periodic component, for any large 
enough finite time interval ([fi,t2]) '^^ can find full path iiY")'^_^), which is equivalent to 
iY't)T=-oo '^^^ chosen time segment is equal to iYt)T=-oo- 

Proof: Let us focus first on aperiodic graph (yo To ^^e always in the same periodic 
component) - there is some n G N such that V;j(M");^ > 0. Let us choose any length 2n 
subpath of {Y't)T=-oo ' ^'^^ beginning point there exists a length n path to any vertex and 
then there is always further path to the original ending point. So this subpath is equivalent 
to subpath having any vertex in the middle - let us choose it as the corresponding vertex of 
(Yt)T=-oo ^^^^ Now doing it for two such subsets: [tj — n, + n], [t2 — n,t2 + n] and 

using the equivalence relation third time (between their middle vertices), we get iY")'^_^ as 
required (dashed line in Fig. [8]). 

For periodic graphs, thanks of that Yo ^^d Yq are in the same periodic component, we can 
make above construction for . 

We will now focus on the opposite route - assume that all full pathways are asymptotically 
equally probable to derive MERW formulas and get better intuition about them. Let us imag- 
ine that everything is happening in discrete space-time: Y xZ - the graph is our space and the 
time is the set of all integer numbers. We are interested in finding correlations - probabilistic 
dependence between situations in different times. Let us assume that there is a length I e N 
segment of time (for example [0,1]) and we are interested in probability of situations on its 
endings. For / = it corresponds to probability distribution of events in single moment (mea- 
surement outcomes), for / = 1 it corresponds to transition probabilities, which for Markovian 
process determine situation for larger I. 

The situation looks like in Fig. |8| growing finite length ensembles of paths to estimate 
probability distribution of situations inside some fixed length time segment. Let us choose 
some pattern (path (vJ'^^q): 



(AT*^*^') , := > ■ ^ ^ ...M. 



V Jab ■ /_i ^-^r-kT-k+l r-k+ir-k+2" Yi+k'-iTi+k' 

(.Tttt-k '■ ro=vo, ri=vi,-,ri=vh r-k=a, ri+k'=b 



k' 



Now for strongly connected graph, using Observation 11 twice: to make k ^ oo and k' ^ oo 
limit for v and some other pattern (wJ^^q, we get 

Prob(v) Zab{K'')ab r I]ab(^')avoM.„.M,^,^..M,,_^,,(M'^'),,, 

lim ; — = lim ^ 



Prob(w) ^,^'-00^^^ (iv^.')^^ '^>*^'-ooi:^,(M'^),^ M,„, M,^^^..M^,_^^,(M'^') 



wi b 
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past 
halfplane 



future 
halfplane 



for ^^00 : 



Prob(/) Prob(/) 




Figure 8: Top: schematic picture for Observation 20 of exchanging any large enough subpath 
using equivalence relation for finite intervals. Bottom: calculating probability of patterns from 
ensembles of all allowed paths on interval growing in both directions. 



lim 



Z_ia"^ ■'avg VfjVi V1V2 ^i-i^i 



V; b 



v;-iv, 



(24) 



Wl 



where Mi/j = Xip, ip'^M — Xip"^ are the dominant eigenvectors. 

We see that probability of pattern v is proportional to the number of pathways it contains 
(1 for simple graphs) and to probabilities of ending points of past/future 



.M, 



halfplanes: c/?; and -0^. 

Previously stationary probability formula was guessed and checked (11 ), now we can de- 
rive it using l — O- past and future halfplanes "glues together": 



71; oc (tT; oc for Symmetric M) 

For / = 1 we get transition probabilities: 

Prob((ij)) 



(25) 



Prob((ij)) oc ^iM^j-il^j 



S^ oc oc M 

71.- 



from which using normalization condition (^^5;^ = 1) we get the missing 1/A coefficient for 
MERW formulas - that assuming uniform probability distribution among full pathways, indeed 
unequally leads to MERW. 



For / > 1 we get probability of pathways as previously (13), so equiprobability assumption 
leads to Markovian process as expected. 

The most interesting from this derivation seems to be clear understanding of Born formulas 



(25). In the next subsection we will see that xp corresponds to the ground state of discrete 
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Schrodinger's equation (or Bose-Hubbard Hamiltonian for single particle) and later of the 
original Schrodinger equation after introducing potential and making infinitesimal limit. 

The intuition about Born rules is that amplitudes describe probability distribution on the 
end of past/future halfplanes and if we want to translate them into probability distribution on 
constant time cuts, we need to multiply both amplitudes. Intuitively, to draw some event in 
given time, we have to draw it twice: from the past and from the future of abstract trajectories 
we consider in our ensemble. Time dependent case will bring more intuition. 

3.4 Examples and localization property 

For better intuition about MERW and its difference from GRW, we will now look at simple 
examples. For connection with physics, there will be used lattice-like graphs which can be e.g. 
imagined as crystal lattice or discretization of a continuous system. Standard lattice is regular 
graph, making that GRW and MERW are the same - to observe the difference we can remove 
the regularity by introducing some defects. We will see that in opposite to GRW, MERW has 
strong localization properties. Its stationary probability corresponds to quantum mechanical 
ground state probability distribution, for which Lifshitz argument [18j says that probability is 
localized in the largest defect-free spherical region. It was used to make some predictions of 
statistical behavior in | |13 || and II14II and will be presented here briefly. 

3.4.1 One dimensional segment-like graph 

Let us start with one-dimensional case: length N segment-like graph for which we assume that 
in a single step the walker can jump to one of two neighboring nodes or stay in given position. 
The last possibility denotes that there are self-loops in vertices - we can introduce defects by 
removing some of them. In presented numerical simulations we will choose randomly the 
positions of these defects, like in Fig. |9]- choose some probability p^, which is independently 
used for each node as probability that it is defected. Physical intuition for such simplified 
model could be that there are randomly distributed two types of atoms in the lattice: most are 
potential well for electrons, while the defects are rather repelling. In the next section we will 
get freedom for choosing these potentials in more physical way. 

Let us introduce the potential representing the positions of self-loops: 

^ _ [ if self-loop at position x is present, 
* 1^ 1 if self-loop at position x is absent. 

The eigenequation becomes: 

(AV^), = (Mi,), = xp,_, + (1 - Vjxp, + 1/.,+! /-3^P, / • -1 

£V'x = -(A^)x + V:.V'x (26) 

where (Ai/))^ := ^px-l — ^xp^ + xp^+i is standard discrete Laplacian and £ := 3 — A is analogue 
of quantum mechanical ground energy - maximizing A is equivalent to finding minimal eigen- 
value of the found discrete analogue of Schrodinger's operator (— A-l-V). Like for the quantum 
ground state, stationary probability distribution is p{x) — ip^(x) (ip > 0). Later we will use 
more physical potential and make infinitesimal limit, getting thermalization to the quantum 
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Figure 9: Stationary probability distribution in logarithmic scale for MERW on 1000 node 
segment-like graph with cyclic boundary conditions and self-loops from which some small 
portion was randomly removed: correspondingly 0.002, 0.01 and 0.03. Above the last one, 
there is presented completely different situation after shifting one defect a single position. 



ground state probability density of the standard continuous Schrodinger's equation. The fact 
that obtained Hamiltonian is minus adjacency matrix up to linear transformation, allows to 
connect it also with Bose-Hubbard Hamiltonian for single particle without potential. We will 
look later at the general case, but for single particle the space of possibilities becomes the 
vertices of lattice/graph and so the Hamiltonian — S(i;)ey^i^j + h.c. is equivalent to minus 
adjacency matrix. 

The choice of value 3 in £ = 3 — A formula was arbitrary - different choice would change 
the values of V, such that E — V would remain the same. The reason for the used choice is to 
make that most of V; are zero and E became a small positive number. For general lattice of 
dimension D we will use for example 2D -I- 1 instead of 3. 



The (23) inequality allows to see A as some effective average of degrees of the graph. In 
our case: 

3 -Pd = =^ < A <maxdi =3 < £ < 

A describes the optimal growth of the number of paths while elongation by a single step 
(Ml/) = Ai/)). For paths ending in given vertex, this growth of their number is the degree of 
this vertex - vertices above this average (d; > A or equivalently < £) produce more paths 
than average. Intuitively it acts as there was attractive potential and repulsive for > E 
vertices. 

For regions of constant potential larger than £, like in quantum mechanics the local solu- 



tion of ( [26] ) for such energy barrier has leading to tunneling-like exponential behavior: 

V -£^ {V -Ef'^ 

for some local parameters A and B. Such situations would be natural for example in opposite 



ipix) =Ae^'' +B~^'' where K = arccosh ^1 + ^ ^ ^ j ^ \IV -E -h 
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model: in which most of vertices would not have self-loops. 

In our case we are rather interested in the solution for regions of constant V below E: 

E-V\ , (E-Vf/^ 



( E-V\ . 

'0(x) =Acos(fc(x — Xq)) where k — arccos 11 ^ — 1 v£ — V — 
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A and Xq are some local parameters, Xg is the position of maximal value which is not necessarily 
in the region. The value of i/) cannot drop below 0, so x e (xq ~ ^j^o + Fo^" E — V > 2, 
k > n/2 makes that the region of positive xp completely degenerates, so £ — V has to be 
smaller For example in our case £ can be bounded from above asymptotically by p ^ . The 
Lifshitz argument says that xp is approximately zero out of the largest defect-free region, like 
in Fig. [9] Let us denote the width of this Lifshitz region by 2R. Probabihty of 2R succeeding 
nodes without defects behaves like (1 — p^)^. Such region could start in any node, so for the 
largest of them N(l — p^)^ should be of order of unity, making 2R f« | i^^i^l )| • ^^i^ ^^^^ 
is approximately the center of this region, kR ^ n/2, so 

^ „ „ /^7i|ln(l-pjr ' 

£ ^ 2 - 2 cos — h=» 



\2Rj 4R^ V ln(N) 

Stationary probability in GRW is proportional to the degree of given vertex, so here it 
would be just constant for most of vertices - it has practically no localization properties. We 
see that situation in MERW is completely different - each defect influence the whole system. 
The right hand graphs in Fig. |9] shows how strong this effect is by presenting surprising 
agreement with the Lifshitz argument while shifting one defect a single position. This rapid 
change seems nonintuitive, but it does not mean that the eigenvector changes so drastically, 
only that there was changed the order of eigenvalues of the first two eigenvectors. 

3.4.2 Two dimensional defected lattice 

Let us now look at constructed in analogous way two dimensional lattice with self-loops in all 
but some randomly chosen portion of vertices. The dominant eigenvector is again the ground 



state of the discrete Schrodinger equation (26), but using two-dimensional discrete Laplacian: 



Lifshitz argument suggests that probability distribution should be localized in the largest 
defect-free sphere. From presented numerical results we see that situation is more compli- 
cated now, but intuitively it localizes in the largest nearly spherical defect-free region. 

Two-dimensional example makes it more convenient to compare dynamics of GRW and 



MERW. In Fig. 10 there is example of such comparison of evolution of probability density 
starting with known walker's position - probability density concentrated in a single point. 
We see that after ten steps for both models we can expect similar probability distributions - 
there is not large difference between their local behavior and transition probabilities (up to 
a few percent). However, while time passes the difference grows. GRW behaves like there 
was practically no defects and finally thermalize on nearly uniform distribution. Dynamics of 
MERW is much more complicated. The defects create some entropic landscape - the probability 
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Figure 10: Comparison of evolution of probability density of GRW and MERW on 40x40 lattice 
graph with cyclic boundary conditions and self-loops in all vertices but some randomly chosen 
0.1 of them (represented by squares). The initial situation is probability localized in a single 
vertex (known position) and the graphs represent probability density after correspondingly 
10, 100 and 1000 steps. 



localizes first in some defect-free region of large local entropy production and soaks to finally 



get to the deepest entropic well. The discrete propagator ( 12 ) can be written: 



where we have used eigenvalue decomposition of M = Ylk^k ' ^iSM'kY ■ Eigenvectors vj^jI/^^ 
are real and fulfill: 

A = Ao>Ai>...>A^_i, '0='0o, 'P^'Po 

For symmetric M,if^ — il)^. In quantum mechanics they would be stable eigenstates, while 
here higher states deexcite toward lower ones and finally thermalize in the ground state. 
For one dimensional defected lattice the second eigenvector was previously localized in 



the second largest defect-free region. Figure 11 suggests that this intuition may continue to 



a few further eigenvectors - in this figure the first three eigenvectors visually correspond to 
three largest defect-free regions. However, the fourth one seems to disagree with this rule, so 
generally we should be careful about it. The intuition about MERW dynamics it provides is 
that temporary domination of given coordinate is responsible for localization in corresponding 



local entropic well - for example MERW density after 100 steps in Fig. 10 is similar to = 1 



eigenvector and finally later it deexcitate to the ground state. The initial coordinates in this 
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Figure 11: Plots of absolute values of the first four eigenvectors of graph from the previous 
figure. The first one is positive, while the rest of them have regions of constant sign, which 
are separated by boundaries of near zero values, represented by white color. 



eigenvalue decomposition depends on overlapping of given eigenvector with the initial prob- 
ability distribution. While evolution, the speed of dominance of larger eigenvalue coordinates 
depends on proportion between eigenvalues. Finally, the intuition is that probability will first 
localize in the nearest defect-free region (local entropic well), then it will relaxate into suc- 
ceeding larger Lifhsitz regions and finally thermalize in the ground state. If because of some 
additional constrains lower energy states are somehow restricted, presented picture suggests 
evolution should be "stochastically shifted" toward near (overlapping) eigenstate. 

Assuming that the defect-free region is indeed approximately a sphere, like previously we 

would expect that N{1 - p^)"^' is of order of unity: R ^ ■^^7^^^- 

Eigenvector ip should have maximum near the center of this sphere and has approximately 
spherical symmetry - such local eigenfunction solution is approximately Bessel function Jg- 
ipir) ^ jQ^jr/R), where r is the distance from the center and j ^ 2.404825 is the first zero of 
Jq. Finally we obtain: 

jV nf\Hl-p,)\ 



R J ln(N) 

For the general dimension of lattice D, skipping lattice-dependent multiplicative constants, the 
above estimates becomes: 

V'° ^ r un(i-pj y'° 

V|ln(l-Pd)l; I In(iV) J 

The previous examples were using indirected graph and so symmetric M. Let us now 
briefly look at MERW on modification of these graphs: each indirected horizontal edge is 
replaced by directed edge toward right hand side of the plot. Examples of numerical results 



are in the bottom row of Fig. 12 The asymmetry makes that left and right eigenvectors 
are no longer the same (in practical cases their difference seems to be rather insignificant). 
Thanks of the previous symmetry, there was fulfilled detailed balance condition: for each 
edge, probability flow in both direction was equal. This time probability flows only in one 
horizontal direction - there is nearly uniform flow for low defect rate and it localizes near low 
defect paths for larger rates. The stationary flow allows to imagine this situation as simplified 
conductance model. While in GRW the flow would be nearly uniform, in MERW there appears 




Figure 12: Stationary probability of MERW on 40 x 40 lattice graph with cyclic boundary 
conditions and self-loops in all vertices but some randomly chosen of them: correspondingly 
0.02, 0.05, 0.1, 0.2. In the upper row the graph is indirected, while in the lower row all 
horizontal indirected edges were replaced with directed edges toward right of the plot, to 
simulate conductance in this direction. 



some analogue of avalanche breakdown. For more realistic models, instead of forbidding 
some transitions, there should be introduced potential gradient. In the next section there will 
be introduced required methodology. 



3.5 Various transition times 

It was essential in the used formalism that all transitions last the same amount of time. We 
will not use it further, but for generality let us expand these considerations to situation in 
which edges could require different times of transition. There will be presented construction 
for transition times being natural multiplicities of some chosen unit time, so it can be also 
used for rational proportions by dividing the time unit by the lowest common denominator. 
Irrational proportions would rather require approximating by rational ones. 



Let us first look at the upper part of Fig. 13 - for given graph we can construct period 2 



(or p generally) graph, which second (p-th) power is the original graph while restricting to 
the connected component of main vertices. For multi-edge or weighted graph, the weights on 
such path replacing an edge should be chosen to multiply to the original weight, for example 
as corresponding root of the weight of the original edge. 

Like on the lower part of Fig. [TsJ we can analogously replace edges with one-directional 
paths of chosen length and calculate MERW formulas for such extended graph. Now while 
fixing two vertices and some length, each pathway of this length on such extended graph is 
equally probable. Original edges correspond to subpaths on these paths, with lengths being 
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Figure 13: Top: example of construction of square root analogue of graph. Bottom: example 
of construction of graph to handle various transition times. 



their transition times - the pathway equiprobability condition become as we would expect. 
Obtained stationary probability distribution would have nonzero values in auxiliary vertices 
- it denotes probability of situations that specific transition was already chosen, but it was 
not yet finalized. If we are interested in probability distribution among the main vertices, 
we can for example interpret auxiliary vertices as preparation to transition and so move their 
stationary probabilities to the corresponding starting main vertex. 

For multi-edge or weighed graph the weight of edges should be chosen to multiply to the 
original edge weight. There is some freedom of such choice, but these weights are not used 
independently - weights of full paths would not depend on this choice (with fixed product) 



and so derivation from 3.3 would always lead to the same MERW formulas for all such weight 
distributions. 



3.6 Summary 

GRW is appropriate for a walker which indeed makes succeeding random decisions, using 
exactly uniform probability distribution among the nearest possibilities. MERW should be un- 
derstood in completely different way: the walker does not have to make random decisions - 
the randomness represents only our lack of knowledge. The walker chooses a path in practi- 
cally any allowed way (can be deterministic) and because we do not know which path he is 
choosing, we assume some natural thermodynamical ensemble of possible scenarios - paths. 

Obtaining MERW transition probabilities requires knowing the eigenvector, which depends 
on the whole system - this effective model is nonlocal. It cannot be interpreted that there is 
required nonlocality in walker's behavior - the walker can choose the path in any way he 
want. Nonlocality is only a natural feature of models representing our knowledge - distant 
event may give us missing information, like thanks of angular momentum conservation, spin 
of one particle gives us information about the spin of a coupled one in EPR experiment. In 
MERW case, nonlocality means that to make the best predictions, we should know the whole 
space of possibilities. Later considering time dependent case we will see that we should also 
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know future potential. Models representing our knowledge can have also retrocausality like 
in Wheeler's experiment - it only means that further event may give us missing information 
about past events. 
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4 Boltzmann paths and infinitesimal limit 

In this section we will make basic expansions of mathematical constructions from the previous 
section to make them more physical - add potential and then make infinitesimal limit. 



4.1 Adding potential - Boltzmann paths 

If there is no reason to emphasize some of scenarios, the best assumption is to choose uniform 
probability distribution among them. Standard way of emphasizing some scenarios in physics, 
is by assigning them energy - for example trajectory remaining in potential well should be 
more probable than trajectory tunneling through a barrier, which should be still more prob- 
able than trajectory remaining on the top of this barrier, like in Fig. 14 In such situations 
we maximize entropy while fixing total energy, or equivalently: assume some compromise be- 
tween maximizing entropy and minimizing average energy. It leads to Boltzmann distribution: 

max I -Vp,ln(pJ- Vp,^£, I =ln| for oc e"/^^- (28) 

ta):S;ft=l \ i i J \i J 

where ^ — l/kgT, kg ^ 1.3807 • 10~^^J/K is Boltzmann constant and T is temperature. ^ 
controls this compromise: the higher it is (the lower T), the more important choosing low 
energy is. In zero temperature there would be chosen only lowest energy states, while in 
infinite temperature energy differences would vanish. Minus maximized expression is free 
energy up to constant. 
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Figure 14: Expansion toward Schrodinger equation - emphasizing paths using potential energy 
and making infinitesimal limit. 



Let us define the weights using energy required for given transition: 

Mij=Aije-P'''i (29) 

where V is some potential - usually it will be scalar potential: depending only on the position, 
like V^j — (Vj + Vj)/2, but we can also use vector potential from electromagnetism, for which 
V^j could be essentially different from V,;. If we would be interested in random walk in phase 
space, this term could be used to additionally introduce kinetic energy to the considerations. 
Eventual lack of edge between some vertices can be seen as that there is infinite potential 
barrier. 

Thanks of (29) convention, formula (13) for MERW probability of iYi)\=o path becomes: 



M^^^^..M^^_^^^ XP^^ _ e-/5f^ron+^nr2+-+^n-ir,) 



So in this interpretation, instead of calling it uniform probability among pathways, we have 
Boltzmann distribution among paths by using: 

Definition 21. Energy of path (rJl^^ is V^^^^ + V^^^^ + .. + V^,_^^,. 

Let us now look at the previous entropy formula ( [21] ) : 
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The left hand side sum is already entropy of given random walk. The right hand side sum was 
previously required to take into considerations that there can be multiple edges between given 
vertices: a choice of probability S^j was in fact M^j choices of Sij/M^j probability. For weighted 
graphs this interpretation is far-fetched, so we will further use more physical one ( [29] ) - that 
values of M does not longer represent the number of edges, but correspond to the energy of 
given transitions. 

In this interpretation transition probabilities correspond to single choices, so entropy pro- 
duction per step is just 

S:=-fc«27i;2S;^.ln(S,^.) 

j j 

where instead of previous base 2 logarithms, we have used more appropriate for physics Boltz- 
mann's normalization. n^Sij is probability of (ij) situation, so the second sum in H{S) is 
average energy per step: 



Finally, we see that in this energy interpretation of weights (29), H(S) is up to constant just 
minus average free energy per step: 



F = [/ - rS = -fcgT ln(2)H(S) 
For the complete picture, let us look at the partition function 



(31) 



YoYl 



it is asymptotically proportional to A', so we can check that the free energy per step is as 
expected in thermodynamics: 

1 In(Z') 
F = --lim^-^ = -/c«rln(A) 

p 1^00 L 

For simplicity we will further call both approaches as MERW, but for better intuition here 
are gathered differences between these mathematically equivalent interpretations - for this 
and further sections we will use the second one: 
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If instead we would construct GRW from M (S*^ = M^j/di), the walker would also assume 
Boltzmann distribution - this time not among full paths (scenarios), but only among the nearest 
neighbors (single steps) - minimizing free energy locally, in a way depending on discretization. 
If M is symmetric like Mj^- = e^^^^'~^^'^^^, as previously there is simple formula for stationary 
probability distribution: 

j j 
4.2 Boltzmann paths on lattices 

In the previous section we have seen analogy between the dominant eigenvector and quantum 
ground state for lattice type graphs. Having energetic interpretation of weighted graphs, we 
can take it further. Such lattice can for example represent regular lattice of a crystal or defected 
lattice of a semiconductor. It can also represent discretization of a continuous system - later we 
will make infinitesimal limit to get to the continuous case. Lattices considered in practice are 
often finite - to approximate infinite lattice, there can be used finite one with cychcal boundary 
conditions. For simplicity let us assume we use a finite one, but the considerations can be also 
generalized to infinite graphs. 

So let us assume that we want to model a part od M^, where usually dimension D is 2 
or 3. We cover its part by a lattice {0, .., m — 1}^ - it could overlap with a crystal lattice, or 
just represent discretiztion of a continuous problem. For simplicity let us assume that it is 
rectangular lattice with the same constants in all directions (5 > 0), so ix^)f^^ e Z° represents 
for example x := (5xj)|^^ e M^. Another simplifying assumption is that in a single step there 
is allowed transition to at most the nearest neighbors. Let as also assume cyclic boundary 
conditions - that and m — 1 coordinates are adjacent, so and "—1" below will be made 
modulo m. Finally all vertices have exactly 2D + 1 neighbors (including itself). 

Now we have to choose the potential function - for this moment depending only on posi- 
tion. To model electron in crystal lattice, it may represent tendency to remain near given atom 
(like electronegativity). For discretization of a continuous system, it can be just the average 
potential in given cell (integral of potential divided by volume). Let us choose time discretiza- 
tion: single transition corresponds to e > time . Finally allowing the walker also to remain 
in given vertex, for D = 1 case weights can be for example chosen as: 

(MJ,,+, = (M,\^,, = e-^^ (MJ, = e-^^' (33) 

Where index arithmetics is modulo m. It was chosen to make M symmetric - such that energy 
of path {YiVi^o is 

rv V \ 

For simplicity physical dimensions will be omitted in this paper, but physically e is time, V is 
energy, so Boltzmann distribution instead of energy uses energy multiphed by time (action) in 
this case. Analogously 13 is not one over energy as usual, but one over action. 
To find MERW in this case, let us look at eigenvector equations for this M: 

A, 51/;^ = iM,'iP)i = e-P''^iPi_^ + e-^^^'i/'i + e'^'^'^'^i+i (34) 



4 BOLTZMANN PATHS AND INFINITESIMAL LIMIT 



36 



We can make a few approximations like e ^ 1 — e for small e: 



Solving this kind of equations would be required for lattice of atoms in which the potential 
could vary from site to site. If the lattice was made to discretize a continuous system, for small 
lattice constant (5), in e order we can assume that V and ip are nearly constant: 

Ki>i^mxP)i^i^i-i + i^i + i'i+i-^el3Vi^i 1-2^, l~ (35) 

3pe 

3- A 1 '0i-i-2'0i+'0i+i , , 

3^e 3^ e 

— 2ipi +'4>i+i is used as a discrete Laplacian. If we divide it by 5^, it becomes approxima- 
tion of continuous Laplacian. Average distance in diffusion grows with square root of passed 
time, so for infinitesimal limit there have to be assumed some relation between time and space 
step, like 

5^ 

e = — (36) 

(2D + l)a 

where a > is some parameter we can freely choose. Using this substitution and making 
above derivation for general dimension D, above "3" coefficient becomes 2D + 1 and finally 
we get: 

£e'0x^-g2j 5^ +^xV'x (37) 

' ! = 1 

where x = (xj, .., x^) e Z^, 

2D + 1 - A, 

Ee := 7 777^ (38) 

(2D + l)l3e 

Because of multiplying by negative number, maximizing over A becomes minimizing over E 
- for properly chosen a and 13 : such that j — the eigenvector ip would be the ground state 
amplitude of discretization of Schrodinger's equation and the stationary probability density 
would be the same as for quantum mechanical ground state. 

4.3 Infinitesimal limit - Boltzmann trajectories 

We would like now to make e — > O"*" limit to get Boltzmann distribution among continuous 
trajectories: 

P(pathr) is proportional to g-^i^^^^^'^^'*' 

As it was mentioned, this time in Boltzmann distribution we use energy of path instead of 
energy - multiplied by time like for action. The choice of ^ is arbitrary, but considering time 



dependent case, similarity to quantum formalism (63) will suggest to use 13 — 1/h. 
The eigenvector becomes a function such that 

^(X) ^ where X := (5x;)f^^ e (39) 
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The right hand side of dSTl) becomes 



H^iX) := --A^iX) + ViX)^iX) 



where A = is Laplacian. If we choose 



a ^—13 (40) 
2m 



H becomes Hamiltonian of Schrodinger equation (——A + V). 
Finally eigenvector equation ( [37| ) becomes in the limit: 

where E is the lowest possible eigenvalue (the ground state energy) : 

2D + 1 - A, 



E = lim E, = lim 



e^o+ ' e-o+ (2D + l)^e 
The stationary probability density for such MERW limit is 

p{X) :— "^^(X) for normalized ^ : 



r 

2 



^\X) = 1 (41) 



Let us now find the continuous propagator. The 5;^ matrix looses its meaning in infinitesi- 
mal time step limit, but we can use (S');^ = (^(y)'^ ^ MERW propagator: 



In the e — > 0"*" limit, eigenvector ip becomes eigenfunction ^. Let us focus on ^^-^ fraction 
using ( 35 ) approximation for D — 1: 

M,-X, \ ^ V>,_i+V>, + ^,+i-3e,^V,-0,-A,-0, ^ -0,_i-2-0,+V>,+i ^ 3-3e^V,-A, ^ 



using (38) definition (A^ = 3 — 3^e£^), the first fraction above leads to a A, the second to 

3_3e^V._(3_3^e£j 3 

- -^-Pi^e - 

For general D, as previously above 3 changes into 2D + 1, so finally 
M - 

^— — tends to (aA - + ^ = ^(£ - H)^ 

6 A, 



4 BOLTZMANN PATHS AND INFINITESIMAL LIMIT 



38 



exp 



tends to 



To obtain coordinates of a matrix, we can multiply it both sides by canonical vectors (M^^ = 
ejMCj where = {5^jJjJ. In continuous limit we analogously multiply them by Dirac deltas - 
let us use notation from quantum mechanics to write the final propagator = ^(x)): 



(42) 



It can be imagined that if we know that in given moment the walker has position x, probability 
density of that it will be in y after time t is S\x,y). It seems that there is a problem with 
points where ^ vanishes, like inside an infinite energy barrier - stationary probability density 
there is also zero and infinite propagator means that the walker would immediately escape 
from there. 

The (x|e~'^^'^|y) term is the propagator from euclidean path integrals - it is called the kernel 
and it describes local evolution. The additional term is required to make the propagator 
stochastic and it depends on the whole system, making this model nonlocal - to make the best 
predictions, we should know the whole system. This stochastic model only represents our 
knowledge - the walker does not directly use it, there is no need for nonlocality governing its 
behavior. 

The graph is regular, so for constant V propagator becomes the same as for GRW - lead- 
ing to the Brownian motion with a/ ^ diffusion coefficient. Generally it has much stronger 
localization properties. 

Let us check that this propagator is properly normalized, compose correctly and leads to 
the expected stationary probability density: 

(x|e-^^«|y) (yl^I/) {x\^)e-'P^ 



S\x,y)dy^ 



S\x,y)S%y,z)dy^ 



-tl5E 



dy^ 



= 1 



{x\e-'"\y) Hy) {y\e-'P"\z) ^(z) 



-tpE 



-S/3E 



Hy) 



dy = S^+^(x,z) 



pix)SXx,y)dx = 



^\x) 



(*|x)(x|e-^^«|y)(y|vl/) 



-tpE 



dx 



{x\e-'l'"\y)Hy) 
e-^^^(vl/|y)(y|vl/) 

p-tpE 



dx = 



P(y) 



We have used that ^ is real function, what is also essential for p(x) = ^^(x) formula, which 
in opposite to quantum mechanics does not require using absolute value. For this purpose 
there was previously used Frobenius-Perron theorem - let us take it to the continuous case. 
This uniqueness and positiveness of the ground state eigenfunction can be found for example 
in Faris [20j - here is Theorem 10.3 from this book: 

Theorem 22. Let — L^(M,pi). LetA:^^^hea hounded self-adjoined operator. Assume 
that A < a where a is eigenvalue of A Assume abo that A is positivity preserving. Then A 
is indecomposable if and only if the eigenvalue a has multiplicity one and the corresponding 
eigenspace is spanned by a function u which is strictly positive almost everywhere. 
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Positivity preserving condition corresponds to nonnegativity of matrbc: for all real u>0,Au 
is also real and Au>0, 

Indecomposability of operator A corresponds to connectiveness - there is no projection operator 
(p2 _ p-) Qj^^-Q g non-trivial closed subspace, such that AP = PAP. 

We are interested in dominant eigenvalue of the operator, so assumption that it is bounded is 
necessary. This time there is no longer a problem with periodic graphs - discrete lattice can 
have period 2, but intuitively it degenerates while taking infinitesimal limit. In our case the 
operator is obtained as a limit of nonnegative matrices, so we automatically get the positivity 
preservation and so the main question is about the indecomposibility condition. For example 
on page 72 of cited book, above theorem is used for the ground state of Schrodinger equation 
as we require: 

Observation 23. Let ^ = L2(E",dx). Let Hq = -A and let V > be a function on R" 
which is locally integrable on the complement of a closed set K of measure zero. Assume that the 
complement of K is connected. Then the ground state of H = Hq + V (if exists) is unique. 

So infinite energy barriers could generally divide the space into independent components, 
but inside them the ground state is unique and real, nonnegative. 

On the end of this subsection, let us look at the propagator decomposed in the eigenbase 
of Hamiltonian (assuming discrete energy spectrum) : 

H^i = £;^i where E = Eq<E^<.. 

{M,.\M,.) = 5ij 2|^.)(^.| = i 

i 

where all ^ can be chosen as real functions, but usually only = ^ is nonnegative. Now we 
can write 

S'ix,y)=^ ' ' °, (43) 

Let us imagine that there is some idealized model preferring some concrete solutions like 
orbits of classical mechanics. We would like to add to these simplified considerations 
small perturbations we cannot directly control, like caused by the wave nature of particles. 
Analogous natural thermodynamical approach would be instead of considering a single 
idealized solution, use canonical ensemble of perturbed ones. The presented simplified 
approach uses nowhere differentable diffusive trajectories, which are not very physical and in 
this moment do not allow for additional restrictions. There is required further work, but the 
general suggestion of above propagator is that if idealized solution is additionally somehow 
restricted, adding thermodynamical perturbations would stochastically shift it toward "near" 
low eigenstate: having relatively large projection on the probability density. This restriction 
could be also that lower eigenstates are already occupied by repeUing particles, preventing 
from choosing these dynamical equilibriums by such thermodynamical analogue of Pauli 
exclusion principle. 



Thought-provoking observation from above derivation of Schrodinger's Hamiltonian is that 
Laplacian term is not (like in its quantum mechanical interpretation) a result of kinetic energy. 



4 BOLTZMANN PATHS AND INFINITESIMAL LIMIT 



40 



Local behaviour 
(GRW) 



Uix) oc e-^^W 

"static" 
statistical pliysics 



Future trajectories 




Past trajectories 

n(x) oc ^(x) 




Full trajectories 
(MERW) 




U(x) oc ^(_x)'¥(x) 



stationary density for infinite potential well on [0,1]: ^'{x) = ^(x) = sm(7r;c) 



n(x) = 1 




n(x) = — sin(7rx) 



U(x) = 2 s\n^(nx) 



Figure 15: Comparison of different ensembles and their stationary probability densities for in- 
finite potential well. Nonlocality required for such effective models does not imply nonlocality 
of the original model we are trying to predict. 



but of using lattice as discretization like in Bose-Hubbard model - corresponds only to freedom 
of moving in the space. We will later introduce momentum operator in analogy to quantum 
mechanics, but it describes only density flow, not the real momentum of the particle. To 
include kinetic energy, we would need to include velocity of particle first - consider random 
walk in phase space like in Langevin equation. 



4.4 Comparison of ensembles and interpretations 

If we make analogous infinitesimal limit of GRW instead, from ( |32] ) we get stationary proba- 
bility density: 

p^«^(x) oc e-^^^^^ (44) 

While characteristic length in MERW remains infinite, in GRW case it is the distance corre- 
sponding to the nearest neighbor (5), so it drops to zero in infinitesimal limit - the walker 
makes succeeding random decisions accordingly only to situation in given point. In compari- 
son, MERW randomness represents only our knowledge and the walker could even make de- 
cisions in an unknown complex deterministic way. There is also completely no need he knows 
this (nonlocal) model we use - it appeared by our assumption of thermodynamical ensemble 
of possible scenarios he could choose - asymptotically dominating all other assumptions we 
could make. 



Figure 15 compares three basic approaches to statistical ensembles we could choose. The 
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first one has practically no localization properties. The last one is MERW-based leading to 
probability density localized exactly as in quantum mechanics. The middle one has first power 
relating amplitudes and probabilities, so it would not violate Bell inequalities. However, such 
assumed ensemble changes while time passes, so there is no direct way to infer transition 
probabilities for it (connecting different ensembles). The MERW situation can be seen as two 
glued middle situations - leading to squares required to predict probability on constant time 
cut of such ensemble of abstract four-dimensional scenarios. 

To conclude this section, let us think how to interpret obtained continuous probabilistic 
models. Standard view on Brownian motion is stochastic: that the object literally chooses 
succeeding steps of nowhere differentable trajectory using locally maximizing entropy transi- 
tion probabilities, like a particle drifting in a fluid. On the other side there is ergodic picture 
of classical chaos - the object travels through some concrete trajectory, governed for exam- 
ple by classical mechanics and this trajectory effectively covers the whole space, allowing to 
introduce density function by averaging over infinite time. 

Imagining a particle, it should travel through a differentiable and generally more deter- 
mined than diffusive trajectory. Standard assumption of chaos theory that our model can fully 
describe the evolution is often also not appropriate, because there are usually plenty of hidden 
from us degrees of freedom there, which in practice can be included to considerations only 
as thermodynamical fluctuations. So what we would like is something intermediate - model 
physical trajectories and include thermodynamics. 

The MERW-based approach is intended to be thermodynamical model - because we do 
not know what is exactly happening, we assume Boltzmann distribution among possibilities: 
trajectories. Like in stochastic picture, it is defined by local transition probabilities, but this 
time they are calculated not assumed - to fully optimize entropy/free energy. In subsection 
3.3| we have seen that MERW formulas can be obtained by calculating proportions between 
occurrences of patterns in ensemble of all full paths. These transition probabilities are no 
longer defined by only local conditions, but require the knowledge about the whole situation 
- they should not be interpreted that the walker uses them directly as in stochastic model, but 
only we use them to predict its probability density. Transition probabilities for single steps fully 
determined the process (Markov property), so it is enough to focus on finding probabilities for 
single steps - calculate transition probabilities as proportions between single steps in canonical 
ensemble of trajectories going through a given point like in Fig. [Tj 

The problem is that trajectories we use in the ensemble are diffusive, while we would rather 
expect restricting this ensemble to more physical ones - differentable, approximately preserv- 
ing energy (up to thermodynamical fluctuations). Because in this moment we are interested 
only in probabilities of single steps, which become infinitesimally small in continuous limit, 
one could expect that such restriction to physical trajectories should practically not change 
local transition probabilities - let us call such assumption restricted ensemble hypothesis: 

Hypothesis 1. Averaging over ensemble of all trajectories to obtain probability distribution of 
infinitesimal steps, practically does not change while restricting to physical trajectories. 

There remain a question of defining what we mean by physical trajectories - we should not 
focus just on classical deterministic ones like in classical chaos, but remember that they are 
usually idealized: neglect many hidden degrees of freedom. However, including such ther- 
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modynamical perturbations should not affect proportions between infinitesimal steps while 
averaging over ensemble, leading again to above restricted ensemble hypothesis assumption. 

The real difference between diffusive and physical trajectories appears on larger than in- 
finitesimal time scale - physical trajectories in space do not fulfill Markov property, but their 
behavior depends on additional degrees of freedom like velocity. It could be improved us- 
ing Markov process in phase space instead, getting more complicated optimizing entropy/free 
energy analogue of Langevin equation. 

Observe that even if obtained model is not a Markov process, transition probabilities de- 
scribe average behavior from a given point - while we should be careful about using Markov 
propagator, the stationary probability distribution is always the dominant eigenvector of 
stochastic matrix. The fact that the expected probability distribution agrees with thermo- 
dynamical equilibrium predicted by quantum mechanics, suggests that presented approach 
with above hypothesis is reasonable interpretation. 

To summarize, using MERW-based models to generate stochastic trajectories is not the 
proper intuition. The transition probabilities should be rather seen as averaged local behavior 
over all possible scenarios. The stochastic propagator assumes Markov property, so generally 
we should be careful while interpreting it. The most essential conclusion from these mod- 
els is that the equilibrium probability density is universal thermodynamical effect, especially 
because it is in agreement with thermodynamical equilibrium of quantum mechanics. 



5 Time dependence 

We will now focus on situation when the M matrix can vary in time. There will be consid- 
ered the general discrete case first. These considerations can be used if there are added or 
removed some graph edges while time passes. Later we will use them for lattice graphs and 
vary only weights of edges, representing evolution of potential. Finally we will look at contin- 
uous limit situation and find analogues of probability current, momentum operator, Ehrenfest 
equation and Heisenberg uncertainty principle. These considerations improve intuition about 
time symmetry of this thermodynamical model. 

For simplicity in this section we assume that graph is aperiodic, but this restriction can be 
easily removed. 



5.1 General discrete case 

Let us generalize previous results to M varying with time, what can represnt the change of 
potential like: 



V^. = Vijit) Mf. = A;^e"^^y or symmetric M5. = e"^^ 

Previously powers of matrix represented behavior on time segments, now there are required 
time dependent analogues. In this section we will use upper index as time instead of power. 



For Boltzmann distribution among paths, by extending definition 21 we would like that: 

energy of path irt,rt+i, ■■■,rs) is V'^^^+i + ... + v;"^'^, (45) 
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where for this section we will assume that s > t. 

For this generalization we can retrace one of the original derivations: 

- by expanding path ensemble in both time directions, automatically getting normalization like 
in 



3.3 



or 

- by expanding in single direction looking at its first step and then make normalization like in 

EH 

5.1.1 Generalized dominant eigenvectors 

In any case, we need analogues of right and left dominant eigenvectors (real, nonnegative) : 
probability distributions for final situation of ensembles of infinite one-sided paths - into the 
future ("0) and eventually into the past (c/?): 

:= lim — ^ = := lim — (> 0) 

(46) 

where J/' are some normalizing functions - as previously fffxpl will correspond to prob- 
ability of i (in time t), so it would be useful to make that that Vf 'fl'ipl — 1- It could be 
achieved using ^'(l) = ^'(l) = 

M'^ '..M'^+' ^ but it could make one eigenvector van- 
ishing while the second goes to infinity. Anyway, these formulas using propagator from plus 
or minus infinity are rather impractical - more useful are analogues of eigenvector equations: 

yiM'M'+^M'+\.M'+% 
m^xP'^'), = hm ^ — — ^ = X^^P[ (47) 

WfM^), = hm ^ ^ = (48) 

where = lim,^^ ^ = lim,^^ 

Let us assume that ^f\'4'\ remains constant for normalization: 

So assumption that {(f'^Yip'^ — const is equivalent to A = A as expected (also for different 
eigenvectors). Other view on this condition is through making below multiplication in two 
ways: 

iip'yM'xP'+' = X' (generally: {^{Ym'^I^^ = X{) (50) 

5.1.2 Generalized further eigenvectors 

Choosing M as constant in time, we see that these generalized eigenvector equations should 
have N orthogonal solutions (for this short part, lower index denotes the number of solution 
of eigenequation) : 

M'^[^' = X['4,[ iiflYM' = X'M-^'y for X'=Xl>X[> ... > A^,^ 



5 TIME DEPENDENCE 



44 



which is fulfilled by decomposition analogous to the stationary case: 

^J]Ki>'M'''y for orthogonal ip,i>:y, = 5^,, 

k 

For locally stationary situation we can use standard diagonalization, then there can be used 
the eigenvector equations to obtain their evolution. In our case we are only interested in the 



dominant ones: = -0' ^p'^ — ^p^. They can be also directly calculated using (46 ), which can 



be seen as generalization of the power method for finding the dominant eigenvector. 
5.1.3 Time dependent MERW 

With A = A assumption we know that is constant in time, so the exact choice of A only 

determines balance between and xp, such that their product remain constant. We are mainly 
interested in n vector depending on their product and S matrix depending on division of xjj 
coordinates, so this balance between and xp is in fact irrelevant - this means we could choose 
practically any A. The only problem of choosing it in not optimal way, is that ip could grow 
exponentially to infinity while (/? would drop to zero or oppositely, what could be inconvenient 
in practical calculations. 

There are some situations allowing to "calibrate" A, for example if potential is practically 
constant on some time segment, both (p and xp should tend to the stationary situation in 
which they are left and right dominant eigenvectors of this locally constant M - choosing A as 
corresponding eigenvalue allows to make these eigenvectors constant. If potential evolution 
is slow enough, we could assume that such equilibrium is constantly maintained - in such 
adiabatic approximation probability density in given time should be nearly the same as if 
potential would remain constant in time. 



Now considering ensembles of paths growing in both time directions, (24) becomes: 
Probffv-y ) -M' M'+^ ..M'-^ -xp' 



Prob((Wi)' J u?' -M' M^+i ..M''^ -xp' 



(51) 



As previously, for s = t we get dynamical analogue of stationary probability density - without 
additional knowledge, the optimal assumption of probability density in given moment is: 

n\^^lxp\ for Y.'pI^I^^' ^ = ^ (^SS) 

For s — t + 1 we get time dependent S matrix: 

M'. xp'+^ 



Required normalization constant (1/AO is consequence of ( [47] ) 



u)^.M'.xb'.+^ 
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Figure 16: Example of segment graph with cydic boundary condition on which there is 
switched potential well. Amplitudes are normalized to one, so they can be interpreted as 
densities on the end of past/future ensembles of one-sided paths like in Fig. [Ts] Their non- 
adiabatic evolution start from stationary state in past/future and evolve toward future/past. 

The analogue of power of S matrix now depends on time segment this propagator corresponds 
to: 



iM'M'+\.M'-%^p] 



The Botzmann distribution among finite length paths became: 



(54) 



TtTt+i Tt+iYt+i Ts-iYs ^Ys 



TS-1 



YtYt+1 Yt+iYt+2" Ys-iYs 



-\rs] 



(55) 



Fig. [16] brings some intuition about the situation, i/j represents estimation of the future 
behavior and evolves backward in time and if represents knowledge about the past situation 
and evolves forward in time. This situation has time symmetry: transposing M and negating 
time would switch ip and >f . The retrocausality of this effective model means only that to make 
the best estimation of the walker's position, we should know how the system will change in 
the future. It cannot be interpreted that the walker needs to know the future. 



5.2 Continuous limit 

As previously, we would like now to find the continuous limit for lattices. Knowing the dom- 
inant eigenvectors in some moment, we can use analogues of eigenvector equations to get 
their evolution: 

M'xp'+^ = X'xp', {M'Yif' ^X'if'+^ (56) 
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In addition to stationary situation, there appears time dependence of these generahzed eigen- 
vectors now - difference between succeeding ones will lead to time derivative in continuous 
limit. 

The fact that xp represents ensemble of possible further evolutions allowed previously to 
understand the requirement of squares relating amplitudes and probabilities. In opposite to 
(f, natural direction of evolution of ip is into the past. We are usually interested in evolution 
into the future, what leads to some inconvenience - for this purpose we should iterate 
matrix instead, but it diametrally changes the attractors ( A ^ 1 / A) . For example the attracting 
dominant eigenvector becomes the most repelling one - of the least absolute value. We will 
derive and use these unstable equations in theoretical considerations, but one has to be careful 
about using them especially in numerical calculations. For this purpose, it is better to start with 



some equilibrium in future and evolve ip backward in time like in Fig. 16 

In opposite to thermodynamics, quantum mechanics has unitary evolution in which this 
inconvenience disappears. Instead of exponentially vanishing/exploding coordinates (eigen- 
values on real axis), the absolute coordinates remain constant (eigenvalues on unit circle). 



This time (35) approximation for D — 1 becomes (unstable): 

r = (M^^^+i), ^ xP'^t\ + + ^P[X\ - 3epV^xPl I - ^ 

The last i/i should originally have t + 1 upper index, but we are again interested in infinitesimal 
limit, in which terms of higher than epsilon order will vanish. As previously let us connect 
eigenvalue with energy: = 3 — 3e^£^, getting 

3(V^x-V^r')-3f^^eV'>V'^^\-2i^r^ + ^^+\-36^y>^ / 

_^ '^_^^t^t ^ ^^^x-l V^x ^^yt^t 



Sep 



Now as in 4.3 , the infinitesimal hmit for general D becomes: 



1 d 

-—HX, T) + E(TmX, T) = H(TmX, T) (57) 
p dT 

where T — et, X — 5x, e — j^]-^, E^T) = lim^^o-^e previously 

H(r)^(X, T) := --A^iX, T) + ViX, T)^(,X, T) 



We can make the same route for or just look at ( 56 ) - now time derivative is with opposite 
sign and we should use transposed matrix instead. The matrix is real, so this transposition 
corresponds to conjugation of obtained Hamiltonian. It is usually self-adjoined (H ' = H), but 
let us look at the general situation: 

-~HX, T) + EHMX, T) = H%TMX, T) 
p dT 
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Finally we can write these equations for evolution probability densities for past and future half 
planes: 

Id Id. 

-— * = (£-H')* -— ^ = (H-£)^ (58) 

p at p at 

In quantum mechanics = const for complex tp because "bra" rotates in one 

direction i{ip\ e'^"^^{ip\), while "ket" rotates in the other (|i/)) e~''"^^\ip)). Here 
(4>|vl/) = const for real positive 4>, ^ because one drops exponentially while the other rises 
((*| ^ e-^'^*-^)(*|, 1^) ^ e^'^^-^^l^)). 

As it was commented for the general time dependent case, the choice of E does not affect 
probability density or propagator, so if we do not care that eigenvectors goes to zero and infin- 
ity, we could choose even E = 0. For numerical simulations eigenvectors can be synchronized 
to ^ = 4> for self-adjoined H while locally stationary situations, by using E as the ground state 
energy. As previously, we can also make adiabatic approximation - calculate ^(t), *(t) and 
£(t) as the potential was not going to change: 

£(t) = (*(t)|H*(t)) 

Having $ and ^ we can calculate the expected probability density: 

p(.,t) = *(x,0*(x,0 if J*(.,0*(x,t).x=l C59) 

Hamiltonians having different potentials generally do not commute, so formally like in quan- 
tum mechanics to calculate propagator there is required time-ordering operator (5^) : 



S^'^(x,v) = lim 



S^'^+^(x,Xi)S^+^'^+'^(xi,X2)...dXidx2..= 

X1X2.. 



~ J^iZS) ^(x, r) ~ J^izs) Hx/n 

where J/" can be obtained from J S^'^ix,y)dy = 1 normalization condition. 

5.2.1 Probability current 

The probability current is much simpler to obtain: 

= a ((A*)^ - *(A^)) = aV • ((V*)^ - *(V^)) 
So probability current and continuity equation became: 

J = a(<J>(V^)-(V<J>)^) -^p = -V-J (60) 

dt 
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Let us find correspondence between this formula for real ^ and its quantum mechanical 
analogue: j = ^ (xjjVxp — xpViJ)^ for complex ip. Substituting for example xp = + i^) 
for some phase y, we get: 



4mi 



2m 



It is exactly the equation (|60|) obtained from MERW for a = — . Generally above choice 



of xp changes probability density (,\ip\^ / *^), but we get equality if ^ = 4>, what can be 
obtained for self-adjoined H when the system evolves so slowly, that we can assume constant 
equilibrium (adiabatic approximation). 



5.2.2 Ehrenfest equations 

Like in quantum mechanics, we can introduce operators acting on states in fixed time - we 
will look at their expected values here. This time they are not necessarily self-adjoined, so we 
need to clearly describe which side they apply to: 



(O) := (*|0^) 



Probability current allows to calculate time evolution of the expected position: 



dt 



xpdx 



r dp 

x^—dx — — 



dt 



x(V- J)dx 



Jdx 



= a 



*(V^)dx = 



m 



(P) 



m 



where we have used partial integration twice, assuming vanishing at boundaries and 



p :— 2maV 



(=ftV fora = — ) 
2m 



(61) 



This time momentum operator is not self-adjoined, but antihermitean (p ' = —p) - to apply it 
to $ we would use — fiV instead. Intuitively 4> is the one evolving forward in time (stable) - 
the momentum operator can be imagined that the increase of density, brings thermodynamical 
flow in opposite direction to equilibrate densities. 

For not self-adjoined p, usually also is not self-adjoined - we can repair it using p 'p 
instead. Now the choice a = ^ as previously allows us to write our H in more physical way: 



P P 
H= — + V 
2m 



This choice of a means that 



2m 1 



for a 



h 

2m 



(62) 



(63) 
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Figure 17: Simulation showing time-symmetry and that acceleration is opposite than in New- 
ton's equations. While switching potential, probability density travels from thermodynamical 
equilibrium in the first well to the second one - first accelerating uphill the potential, then 
decelerating downhill. The acceleration grows with (VV). 



Let us now find time evolution of the expected value of a general operator - analogue of 
Ehrenfest equation: 

d . . . do 

— ($|0^) =^(4>|(£-H)0^) + (*|— ^) +^(*|0(H-£)^) 
dt at 

d _ /dd\ / - - X 

^(0) = (^ — ^+^([0,H]) (64) 

[x,p] — 2ma — h (65) 

a dix) (p) 

[x,H]=2-V ^ ^ = (2aV) = ^ 

p dt m 

[p,H]^[hv,v]^hvv => 

p(x)Vy(x)dx (66) 



for example 



d '' 

— (p)=^(ftvv) = (vy) = 

dt 



Surprisingly, we get 



dt^ 



what is opposite acceleration than in Newton's law. Example in figure 17 gives intuition 
that this looking contradictory result, is in fact expected. The system constantly searches for 
the thermodynamical equilibrium - while evolution is slow enough and the current entropy 
well remains optimal, probability density remains there. However, when a different local 
potential minimum starts dominating hke in Fig. [9| probabihty density has to get out of the 
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current potential well - accelerating uphill, then decelerating downhill to finally stop in the 
new optimal minimum. 

This example also provides educative demonstration of time-symmetry of this thermody- 
namical model. Our best estimation of particle's probability density in plus or minus infinity is 
the ground state of corresponding potential well. The switch is symmetric, so our prediction 
of intermediate densities is also time-symmetric. 

5.2,3 Heisenberg uncertainty principle 

Having momentum operator, we can derive Heisenberg uncertainty principle analogue for 
adiabatic approximation = This time p is not self-adjoined, so (p^) does not have to be 
nonnegative. Instead, like for Hamiltonian we need to use: 

ip^p) = mp^p^) = ip^p^) 

Now for any A: 

< ((x + Ap)^Kx + xp)^) = mix - xpxx + xp)^) = (x^) + x^ip^p) - xn 

This quadratic equation must have nonpositive discriminant, getting analogue of Heisenberg 
uncertainty principle: 

/m/W) > \ (67) 

6 Multiple particles 

In this section there will be discussed extensions of MERW methodology for multiple walkers 
on a single graph. It will be made in analogy to quantum mechanics, but these considerations 
are purely thermodynamical. The original graph of positions needs to be extended to graph of 
particle configurations on it - for distinction in this section we will call vertices of the original 
graph as nodes, while of the extended graph as vertices. 

6.1 Noninteracting particles 

The stationary probability density fi-om the previous sections was originally obtained as the 
best assumption we can make (maximizing uncertainty) for a single particle on the graph. It 
is also the dominant eigenvector of the stochastic matrix S, so in time-independent case it can 
be also seen as time average of particle's position over a long period. 

Another general view is for multiple noninteracting particles - if they behave independently 
and each of them is expected to agree with this prediction, their actual density (current dis- 
tribution among nodes) should asymptotically be near this expected probability density. More 
precisely, asymptotically this agreement improves exponentially with the number of particles 
and the coefficient is KuUback-Leibler distance between these distributions. 

To see it, let us look at the simplest case of n G N particles on 2 node graph: there is 
a random variable with binomial probability distribution (p, 1 — p) and we are interested in 
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asymptotic probability that if we use it n times, the first possibihty will happen qn times 
(p,qe[0,l]): 



where we have used asymptotic approximation of binomial coefficients from subsection 3.2 
Straightforward generalization is: while using n times probability distribution (p;);, asymp- 
totic probability of getting inq^)^ distribution is: 

PrMqMPi))^^-"''-^™^^^^ where D^.HqMPi)) -J^^li^S^j^ (68) 

is called KuUback-Leibler distance of these probability densities, but usually it is not symmetric. 
It is nonnegative and is zero only when (pj and (q;) are equal - as expected, in n ^ oo limit 
(<?;) = (Pi) case completely dominates all the others. 

Approximation that particles does not interact should be appropriate when their density 
is relatively low. In other case, there could be used some mean field approximation. For 
example: find the ground state probability density for single particle, assume such density 
of e.g. electrons to correspondingly modify the potential, then find the ground state for this 
modified potential and so on until stabilizing such iterative procedure. 

6.2 Fixed number of interacting particles 

Let us now extend the MERW methodology to directly work with fixed number of interacting 
walkers /particles. 

6.2.1 Distinguishable particles 

We will start these consideration with two distinguishable particles on the same graph, but it 
can be simply generalized for larger fixed amount. Position of such couple can be denoted as 
(xi, X2) e y X y. One way of choosing the adjacency matrix is that there is allowed transition 
from {xi,X2) to (71,72) vertices if and only if there is allowed transition from Xi to and 
from X2 to 72 nodes. A different choice is used in Bose-Hubbard model: such transition is 
allowed only if a single particle jumps to its neighbor, while all the others remain in their 
positions. 

The next step is to assign energy to these transitions. If we would choose it as just the sum 
of energies of the two original edges, we would make these particles completely independent 
(assuming that simultaneously only one of them can make transition, would only change time 
scale). As in physics, to introduce interaction between them, we can use potential energy 
depending on both positions like Coulomb attraction/repulsion. So finally the ^ matrix for 
this extended graph is not just the tensor product of two copies of the original matrix (M(8>M), 
but it additionally have term describing their interaction, like 

-^ixi,x2Uyi,y2) - ^ ^'^^J 
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Figure 18: Example of stationary probability distributions for two particles on segment graph 
with self-loops = 1). The bottom row shows 7i(x) = 7i(x, y) and its last graph has addi- 
tional probability density for the second eigenvector to compare with Pauli exclusion principle. 



where for example Vf(x^,X2, 71,72) = ^' '^^^'^^^ for some symmetric potential V^°. 

Now random walk in such extended space (Y x Y) corresponds to thermodynamics of coupled 
evolution of these two particles - the dominant eigenvector of ^ determines stationary proba- 



bility in this extended space like in Fig. 18 If we are only interested in e.g. probability density 
of one of these particles, we can sum the joint probability density over some coordinates. 
In continuous limit we would analogously use eigenfunction equation for "^12(^^7)' 

H|*i2) = £1^12) where H = ^ + ^ + V(x) + V(y) + V,(x, y) 

2m 2m 

and lower index in momentum operators denotes variable it applies to. Like in quantum 
mechanics, for noninteracting particles the eigenfunction can be seen as tensor product 

(*i2(^,y) = ^iW^2(j))- 



6.2.2 Indistinguishable particles 

For particles we cannot distinguish, (x,y) vertex is for us equivalent to (y, x). It can be used 
to reduce considered number of vertices nearly twice or generally factorial of the number of 
particles times. For this purpose we would use only vertices of coordinates sorted in some 
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linear order (like x < y) and identify them with all possible permutations. This reduction of 
the number of states in this effective model is not compulsory. Symmetrization is only abstract 
way for representing our information and it does not imply that underlying physics cannot 
distinguish particles having different positions. 

In discrete case the number of corresponding permutations decreases when some coordi- 
nates equalize (e.g. x — y) - we need to be careful there. These vertices have neighbors of 
not ordered coordinates - the weights of edges to these vertices cannot just vanish. Instead 
these weights should be added to weights of edges to corresponding vertices of ordered 
coordinates, like in Fig. [18] Additionally, one need to remember that obtained probabilities of 
nondiagonal vertices correspond to both permutations (n! generally) - to make that density 



is just restricted density of distinguishable case in Fig. 18 these diagonal probabilities were 
divided by 2. 



The diagonal degenerates in continuous limit, so in this case we can neglect above com- 
plication. If Hamiltonian is invariant with respect to particle exchange, the fact that ^^aC-^jj) 
is the dominant eigenfunction (minimizing energy), implies that ^^2(75-^) ^l^o fulfills eigen- 
function equation to the same eigenvalue. This eigenfunction is unique, so is symmetric 
(^i2ix,y) — ^i2iy,x)). This symmetry is characteristic for bosons in quantum mechanics. 
Positivity of ^ coordinates in MERW does not allow for direct antisymmetrization required 
for fermion formalism, but we could for example artificially remove vertices with multiple 
particles in the same position (the diagonal for two particles) and edges to them. 

However, even without artificially including some Pauli exclusion principle, the repulsion 
makes that the dynamical equilibrium state has already anti-correlated positions of these 
two particles. It applies to both MERW approach and two particle quantum wavefunction 
('0(x,y)) - it is usually approximated by tensor product of single particle states, but in fact 
the repulsive potential makes that the complete two particle wavefunction avoids configura- 
tions with e.g. electrons being close to each other (barriers of six dimensional potential) - 
probability density should be much larger when they are on opposite sides of the nucleus. 
Placing three electrons this way seems to be much less stable, especially remembering about 
magnetic moments - suggesting that even without Pauli principle, one of them should search 
for a different orbital. 



6.3 Harmonic oscillator and creation/ annihilation operators 

Let us briefly look at the standard harmonic oscillator example - one dimensional continuous 
model with V{x) — ^mco^x^ potential for some angular frequency co. Its simplicity, existence 
of analytic solutions and equidistance between energy levels made it standard way to intro- 
duce multiple particle formalism, usually called second quantization. For example for lattice 
we could model its nodes as approximated by such potential. Like in quantum mechanics, 
in MERW formalism the harmonic oscillator Hamiltonian can be decomposed into first order 
annihilation(d;) /creation (d.') operators: 

H — h -mco X — -hco d'd-l-dd' =hco\ d'd-l-- 

2m 2 2 ^ V 2 
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for real 




because in our case p — ftV = — p ' . 
Eigenfunctions of this Hamiltonian are known: 

7j;-l/4 

(x|n) = ^=e-^'/2^„(x) d|n) = yH|n-l) a'\n - 1) = ^ \n) 

where is n-th Hermite polynomial. This time for single particle only the lowest state is 
thermodynamically stable (|0)) - excited states would deexcite to this ground state. However, 
when there are multiple repelling particles, they should choose distant thermodynamical equi- 
libriums. For example figure [18] suggests that for two repelling particles, probability density 
for each of them is similar to of the second lowest energetic eigenstate (|1)). However, we 
should have in mind that quantum or MERW amplitudes for interacting particles are a bit 
different than for noninteracting ones. 

Annihilation/creation operators and |n) states can be also used as purely abstract way to 
work on multiple particle states/nodes. So let us disregard underlying physics for now and 
find the universal combinatorial coefficients by considering a graph of single node which can 
contain various number of particles (n). For clearer picture, let us assume for a moment that 
they are distinguishable - transition to from n to n — 1 particle state can be made by removing 
one of n particles - in n ways. Transforming back one of these subsets into the original n 
particle state can be made in only one way: 

d\n) — n\n — l) d'|n — l) = |n') 

where \n) is nonstandard normalization of n particle vertex - this time it represents the sum 
of n\ permutated states. However, it already leads to the proper commutation relations: 

riln) := d'd|ri) = n|ri) dd'^ln) = (n + l)|ri) [d,d"^] = l (70) 

Standard normalization of these states is times the sum of n\ permutation states: |7i) = 
^/n\\n), leading to relations used in quantum mechanics: 

^ . ^ .1 

a\n) = ^/n\n-l) a''\n - 1) = ^/n \n) (d')"|0) = (71) 

vnT 

6.4 Various number of particles and Bose-Hubbard model 

To consider n particles on graph (y, (f), there can be used y" extended graph with corre- 
spondingly defined transitions. To include various number of particles, the final graph be- 
comes union of such graphs, between which we need to define allowed transitions (e.g. using 
/d formalism). Then the potential allows to choose new M matrix for such extended graph. 
Additionally, there could be required chemical potential (or rest mass in QFT) to include the 
change of energy of system while the number of particles increases. Finally, for example the 
dominant eigenvector of M determines stationary probability distribution among all vertices 
of this union of y " . 
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Figure 19: Example of extended graph for fermions on 4 node segment-like graph, such that in 
one step single particle can move by one position or be annihilated/created. Each of 2"^ = 16 
vertices represent situation in some moment, stationary probability distribution is among these 
16 vertices. 

For indistinguishable particles there appears more convenient way of representing multi- 
ple particle states: by storing only the number of particles in each vertex - vertices of such 
extended graphs are functions from Y to the set of possible number of particles in a single 
vertex. For fermions there should be at most one (y {0, 1}), for bosons there can be any 
natural number of particles in a node (Y —^N). The boson picture is also used with repulsive 
interaction to approximate fermion behavior. 

We are now prepared to look at the Bose-Hubbard model ['21] proposed and used for 
example for solid state physics or optical lattices. The main interest is the only stable: ground 
state to which other would deexcite. The simplest used Hamiltonian for (Y, S) underlying 
graph (usually a regular lattice) is: 

HB„:=-t ^l^i + ^Y^^i^^i-^) (72) 

Model parameter t chooses transition probability (the larger t, the less important are other 
terms), U is additional energy for two particles being in a single node. There is usually also 
added Hermitian conjugate of the first term - it can be made by just using indirected graph 
((i, j) e <f => (j, i) e S), so presented form is more general. This first term defines adjacency 
matrix of the extended graph of possible situations - there are allowed transitions shifting 
single particle to a neighboring node. 

The second term introduces local repulsion for this bosonic model (used as approximation 
for fermions) - that there is required additional energy to place more than one particle in a 
single node. There is often added chemical potential term to compensate for various number 
of particles, but the fact that the number of creation and annihilation operators in each term 
is equal, makes that the number of particles is constant in this case. This model can be ex- 
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tended for example by including local energy of nodes (V(i)) or interaction between particles 

mi,j)y. 

Let us now look at the situation from MERW perspective. If there is only a single particle, 
the graph of possibilities is exactly the original graph (y, (f). In such case interactions vanish 
and the original Bose-Hubbard Hamiltonian becomes equivalent to — tM, where M is standard 
adjacency matrix used in the basic MERW and this purely thermodynamical model also pre- 
dicts the statistical equilibrium to be as for the quantum ground state of the Hamiltonian. The 
other terms do not change the vertex - their direct interpretation is self-loop like in subsection 



3.4: that using only transition terms costs zero energy, but each time the particle would like 
to stay in a vertex, the system would have to pay in energy. 

MERW approach suggests to realize these terms in a bit different way. Specifically, Boltz- 
mann distribution among paths requires multiplying Hamiltonian transition terms by e~^^^, 
where e is time step and V generally may depend on the whole configuration of particles be- 
fore and after the transition, for example expressed using n operators like interactions in the 



Bose-Hubbard model. In analogy to making continuous limit of lattice in subsection 4.2 for 
small lattice constant we can approximate it in e order to Bose-Hubbard Hamiltonian: 

TT ^ V j^ t^ „-e/3v(configuration before and after transition) ^ 

p« — ^ dj + el3d ^ V (configuration after transition) djd^ 

where d is constant for lattice degree of vertices. The first approximation is using only linear 
expansion of exponent e"'^'^^ f« l — efdV, the second that V is smooth - has practically the same 
values in neighboring vertices. The third approximation is that the dominant eigenvector is 
nearly constant in neighboring vertices - it becomes more complicated now: requires modifi- 
cation of indexes of annihilation operators (.d'-dj ^ d-dj), but it should not essentially change 
the dominant eigenvector we are interested in. By choosing V we can now get Bose-Hubbard 
Hamiltonian as required. 

To summarize, MERW approach leads to Hamiltonian which is practically equivalent 
to Bose-Hubbard Hamiltonian if there is no potential/interaction or in continuous limit. 
In general case they only approximate each other. Bose-Hubbard Hamiltonian seems to 
be introduced by direct analogy to continuous case, while for MERW it is derived from 
model which mathematically is nearly Feynman path integrals in imaginary time. Obtained 
difference suggests to investigate the process of translation continuous models to discrete 
cases. 



The next step should be making infinitesimal limit for varying number of particles to get 
thermodynamical analogue of Quantum Field Theories. There appears technical difficulties for 
direct approach, like that eigenstates of thermodynamical analogue of momentum operator 
(hV) are exponentially growing or vanishing - are very different from plane waves, which 
for MERW are unavailable > 0). It suggests using Laplace transform instead of Fourier 
transform momentum space, but it seems problematic. Future development of such MERW 
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expansions could bring some better intuition about many problematic infinities appearing in 
QFT. Anyway, from mathematical point of view, the MERW approach is very similar to using 
quantum mechanics in imaginary time, which is popular while finding the ground state. For 
example the expansion into Feynman graphs expressed by a sequence of annihilation/creation 
operators, in imaginary time becomes canonical ensemble of situations described as graph 
and its parameters - Boltzmann distribution among four-dimensional scenarios what MERW is 
constructed on. 

7 Conclusions and further perspectives 

There was introduced and discussed basic formalism, properties and intuitive examples of 
MERW-based thermodynamical modeling. These thermodynamical motion models still require 
a lot of work to develop it into a mature complementation of standard approaches, but already 
seems to provide explanation why in some situations standard thermodynamics disagree with 
experimental and quantum predictions, suggesting the way for correction. Standard way of 
improving the inconsistencies of stochastic models is by introducing some anomalous diffu- 
sion, but usually without proposing an explaining mechanism. Explanation of the presented 
approach is that the standard models only seem to fulfill thermodynamical principles, while 
in fact they are often biased: against maximal uncertainty principle, they unknowingly em- 
phasize some possible scenarios without a base for such assumption. Equivalently instead 
of using ensembles of static scenarios, for agreement with thermodynamical predictions of 
quantum mechanics we should consider ensembles of dynamical ones: trajectories, histories. 

For MERW Hamiltonian to became the Schrodinger operator there was not required 
fixed noise level, but similarity to quantum formalism for time dependent case suggested to 
choose 13 — 1/h. The main source of this fundamental noise seems to be the wave nature of 
particles, for example as a result of de Broglie's internal clock. This thermodynamical model 
completely ignores other effects related to the wave nature, which is generally seen as the 
essence of quantum mechanics, required for interference or orbit quantization. However, 
surprisingly there already appears the structure of eigenstates of Hamiltonian, but in a 
different way: as temporary attractors for probability density, which might occur permanent 
if lower eigenstates are somehow restricted, for example by being occupied by repelling 
particles. Deep understanding of such thermodynamical analogue of Pauli exclusion principle 
is one of the most important further line of work. Another essential property ignored by 
this thermodynamical approach is energy conservation, which while physical deexcitation 
results in photon production. These lacks could be reduced while considering not nowhere 
differentable trajectories (diffusive) like presented, but more physical ones - smooth and 
being perturbations of classical trajectories. Different possible lines of development is for 
example the infinitesimal limit of Bose-Hubbard-like model to get a better understanding of 
e.g. problematic infinities of Quantum Field Theories. 

Richard Feynaman has written that "I think I can safely say that nobody understands quan- 
tum mechanics". While recent experiments of Couder ([]8D) showed that double-slit interfer- 
ence can be observed and understood for classical macroscopic objects having wave-particle 
duality, the surprising agreement of presented approach with thermodynamical predictions 
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of quantum mechanics seems to complement the picture. There appears new hope to make 
quantum mechanics not only a tool for calculations, but also a theory with deeply under- 
stood foundations. The Schroedinger's cat thought experiment already suggests that quantum 
mechanics is a theory representing information of subjective observer - while for an external 
observer the cat may be in quantum superposition, a different observer inside the box may 
finally get out and show video recording of what has objectively really happened inside the 
box. The interpretation of de Broglie's and the MERW-based approach suggest that we can see 
quantum mechanics as not fundamental assumed theory, but emergent effective one - a result 
of wave nature of particles and thermodynamics representing the most probable behavior for 
our limited knowledge. While the module of wavefunction describes probability, its argument 
describes expected relative phase of such particle's internal clock. 

While Schrodinger equation focuses on wave nature ignoring the corpuscular one, 
presented approach do exactly opposite - there would be useful to construct some joining 
them model. For this purpose there is required a model with energy conservation, in which 
there can naturally appear various number of local particle-like constructs, constantly creating 
waves of the surrounding field like Couder's walking droplets. Their wave nature would 
lead to effects typical for quantum mechanics like interference, while MERW-based thermo- 
dynamical view suggests additional stochastic shift toward probability densities of quantum 
eigenstates. Particle's quantum numbers: more or less conserved properties being integer 
multiplicities, suggests where to search for the corresponding mathematical constructions: 
topological charges like winding number are also integer multiplicities and are restricted 
by corresponding conservation laws. Such topological solitons can be created/annihilated 
in pairs of opposite "charges", there can appear attraction/repulsion for opposite/the same 
topological charge, they have stored some characteristic minimal rest energy (mass) required 
to glue such nontrivial boundary conditions, some of them like so called breathers may have 
the "clock": internal periodic process creating waves around and so on. Skyrme ([22]) has 
made popular using topological solitons as effective models of hadrons, but for example 
Couder's experiments and presented thermodynamical approach suggest to try to use solitons 
as constructs for all particle. For example there is being developed such electron model 
(Faber [|23|]) in which there naturally appears electromagnetic interaction, but this approach 
does not include the wave nature. Recently there was introduced model which can be seen 
as its expansion by a single degree of freedom interpreted as quantum phase, which adds 
the possibility of internal clock (ellipsoid field II15II ). Its family of topological solitons also 
grows to became qualitatively similar to known from physics: with three families of solitons 
resembling correspondingly neutrinos, leptons, mesons, baryons, which can finally combine 
into nucleus-like structures. Qualitatively it has surprising agreement with expected in physics 
quantum numbers, interactions, decays and mass hierarchy. To test this looking promising 
approach, there will be now required a lot of more quantitative considerations. 

I would like to thank Andrzej Horzela for discussion and help in proofreading of this paper 
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